Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

XML-Relational Mapping: Approaches, Techniques, and Queries, Slides of Introduction to Database Management Systems

Duke University Introduction to Database Management Systems

Various approaches to mapping xml data to relational databases, including text files, object-oriented dbms, and relational dbms. It covers schema-oblivious and schema-aware mapping techniques, node/edge-based and interval-based schema designs, and label-path and dewey-order encoding. The document also explores the advantages and disadvantages of each approach and provides examples of queries using these techniques.

Typology: Slides

2011/2012

Uploaded on 01/29/2012

arold 🇺🇸

4.7

(24)

372 documents

1 / 4

This page cannot be seen from the preview

Don't miss anything!

1

XML-Relational Mapping

CPS 116

Introduction to Database Systems

2

Announcements

Homework #3 due next Tuesday (Nov. 2)

3

Approaches to XML processing

Text files (!)

Specialized XML DBMS

Lore (Stanford), Strudel (AT&T), Tamino/QuiP

(Software AG), X-Hive, Timber (Michigan), etc.

Still a long way to go

Object-oriented DBMS

eXcelon (ObjectStore), ozone, etc.

Not as mature as relational DBMS

Relational (and object-relational) DBMS

Middleware and/or object-relational extensions

4

Mapping XML to relational

Store XML in a CLOB (Character Large OBject) column

Simple, compact

Full-text indexing can help (often provided by DBMS vendors as

object-relational “extensions”)

Poor integration with relational query processing

Updates are expensive

Alternatives?

Schema-oblivious mapping:

well-formed XML →generic relational schema

• Node/edge-based mapping for graphs

• Interval-based mapping for trees

• Path-based mapping for trees

Schema-aware mapping:

valid XML →special relational schema based on DTD

5

Node/edge-based: schema

Element(eid, tag)

Attribute(eid,attrName,attrValue)

Attribute order does not matter

ElementChild(eid, pos, child)

pos specifies the ordering of children

child references either Element(eid) or Text(tid)

Text(tid, value)

tid cannot be the same as any eid

)Need to “invent” lots of id’s

)Need indexes for efficiency, e.g., Element(tag), Text(value)

Key: (eid,attrName)

Keys: (eid, pos), (child)

6

Node/edge-based: example

<title>Foundations of Databases</title>

<author>Abiteboul</author>

<author>Vianu</author>

<publisher>Addison Wesley</publisher>

</book>…

</bibliography>

eid tag

e0 bibliography

e1 book

e2 title

e3 author

e4 author

e5 author

e6 publisher

e7 year

Element

eid pos child

e0 1 e1

e1 1 e2

e1 2 e3

e1 3 e4

e1 4 e5

e1 5 e6

e1 6 e7

e2 1 t0

e3 1 t1

e4 1 t2

e5 1 t3

e6 1 t4

e7 1 t5

ElementChild

eid attrName attrValue

e1 ISBN ISBN-10

e1 price 80

Attribute

tid value

t0 Foundations of Databases

t1 Abiteboul

t2 Hull

t3 Vianu

t4 Addison Wesley

t5 1995

Text

Discover Slides of Introduction to Database Management Systems Duke University

Partial preview of the text

Download XML-Relational Mapping: Approaches, Techniques, and Queries and more Slides Introduction to Database Management Systems in PDF only on Docsity!

XML-Relational Mapping

CPS 116

Introduction to Database Systems

Announcements

Homework #3 due next Tuesday (Nov. 2)

3

Approaches to XML processing

Text files (!)

Specialized XML DBMS

Lore (Stanford), Strudel (AT&T), Tamino/QuiP

(Software AG), X-Hive, Timber (Michigan), etc.

Still a long way to go

Object-oriented DBMS

eXcelon (ObjectStore), ozone, etc.

Not as mature as relational DBMS

Relational (and object-relational) DBMS

Middleware and/or object-relational extensions

4

Mapping XML to relational

Store XML in a CLOB (Character Large OBject) column

Simple, compact Full-text indexing can help (often provided by DBMS vendors as object-relational “extensions”) Poor integration with relational query processing Updates are expensive

Alternatives?

Schema-oblivious mapping: well-formed XML → generic relational schema

Node/edge-based mapping for graphs
Interval-based mapping for trees
Path-based mapping for trees Schema-aware mapping: valid XML → special relational schema based on DTD

5

Node/edge-based: schema

Element ( eid , tag )

Attribute ( eid , attrName , attrValue )

Attribute order does not matter

ElementChild ( eid , pos , child )

pos specifies the ordering of children child references either Element ( eid ) or Text ( tid )

Text ( tid , value )

tid cannot be the same as any eid

) Need to “invent” lots of id ’s

) Need indexes for efficiency, e.g., Element ( tag ), Text ( value )

Key: ( eid , attrName )

Keys: ( eid , pos ), ( child )

6

Node/edge-based: example

Foundations of Databases Abiteboul Hull Vianu Addison Wesley 1995 …

eid tag e0 bibliography e1 book e2 title e3 author e4 author e5 author e6 publisher e7 year

Element

eid pos child e0 1 e e1 1 e e1 2 e e1 3 e e1 4 e e1 5 e e1 6 e e2 1 t e3 1 t e4 1 t e5 1 t e6 1 t e7 1 t

ElementChild

eid attrName attrValue e1 ISBN ISBN- e1 price 80

Attribute

tid value t0 Foundations of Databases t1 Abiteboul t2 Hull t3 Vianu t4 Addison Wesley t5 1995

Text

Node/edge-based: simple paths

//title

SELECT eid FROM Element WHERE tag = ‘title’;

//section/title

SELECT e2.eid FROM Element e1, ElementChild c, Element e WHERE e1.tag = ‘section’ AND e2.tag = ‘title’ AND e1.eid = c.eid AND c.child = e2.eid;

) Path expression becomes joins!

Number of joins is proportional to the length of the path

expression

Node/edge-based: more complex paths

//bibliography/book[author=“Abiteboul”]/@price SELECT a.attrValue FROM Element e1, ElementChild c1, Element e2, Attribute a WHERE e1.tag = ‘bibliography’ AND e1.eid = c1.eid AND c1.child = e2.eid AND e2.tag = ‘book’

AND e2.eid = a.eid AND a.attrName = ‘price’;

AND EXISTS (SELECT * FROM ElementChild c2, Element e3, ElementChild c3, Text t WHERE e2.eid = c2.eid AND c2.child = e3.eid AND e3.tag = ‘author’ AND e2.eid = c3.eid AND c3.child = t.tid AND t.value = ‘Abiteboul’)

9

Node/edge-based: descendent-or-self

//book//title

Requires SQL3 recursion WITH ReachableFromBook(id) AS ((SELECT eid FROM Element WHERE tag = ‘book’) UNION ALL (SELECT c.child FROM ReachableFromBook r, ElementChild c WHERE r.eid = c.eid)) SELECT eid FROM Element WHERE eid IN (SELECT * FROM ReachableFromBook) AND tag = ‘title’;

10

Interval-based: schema

Element ( left , right , level , tag )

left is the start position of the element right is the end position of the element level is the nesting depth of the element (strictly speaking, unnecessary) Key is left

Attribute ( left , attrName , attrValue )

Text ( left , level , value )

) Where did ElementChild go?

E 1 is the parent of E 2 iff: [ E 1. left , E 1. right ] ⊃ [ E 2. left , E 2. right ], and E 1. level = E 2. level – 1

11

Interval-based: example

1 2 34Foundations of Databases 67Abiteboul 910Hull 1213Vianu 1516Addison Wesley 18191995 21… bibliography

book

title author author author publisher year

1,999,

2,21,

3,5,3 6,8,3 9,11,3 12,14,3 15,17,3 18,20,

12

Interval-based: queries

//section/title

SELECT e2.left FROM Element e1, Element e WHERE e1.tag = ‘section’ AND e2.tag = ‘title’ AND e1.left < e2.left AND e2.right < e1.right AND e1.level = e2.level-1;

)Path expression becomes “containment” joins!

Number of joins is proportional to path expression length

//book//title

SELECT e2.left FROM Element e1, Element e WHERE e1.tag = ‘book’ AND e2.tag = ‘section’ AND e1.left < e2.left AND e2.right < e1.right;

)No recursion!

Inlining

An author element just has a PCDATA child

Instead of using foreign keys

book_author ( ISBN , author_id )

author ( id , PCDATA_id )

PCDATA ( id , value )

Why not just “inline” the string value inside book?

book_author ( ISBN , author_PCDATA_value )

PCDATA table no longer stores author values

More general inlining

As long as we know the structure of an element and its

number of children (and recursively for all children), we can

inline this element where it appears

… …

…

With no inlining at all

book ( ISBN , publisher_id ) publisher ( id , name_id , address_id ) name ( id , PCDATA_id ) address ( id , PCDATA_id )

With inlining

book ( ISBN , publisher_name_PCDATA_value , publisher_address_PCDATA_value )

21

Queries

book ( ISBN , price , title , publisher , year ),

book_author ( ISBN , author ),

book_section ( ISBN , section_pos , section_id ),

section ( id , title , text ), section_section ( id , section_pos , section_id )

//title (SELECT title FROM book) UNION ALL (SELECT title FROM section);

//section/title SELECT title FROM section;

//bibliography/book[author=“Abiteboul”]/@price SELECT price FROM book, book_author WHERE book.ISBN = book_author.ISBN AND author = ‘Abiteboul’; //book//title (SELECT title FROM book) UNION ALL (SELECT title FROM section)

These queries only work for the given DTD

22

Pros and cons of inlining

Not always applicable

* and +, recursive schema (e.g., section)

Fewer joins

More “scattering” (e.g., there is no longer any table

containing all titles; author information is scattered

across book , section , etc.)

)Heuristic: do not inline elements that can be shared

23

Result restructuring

Simple results are fine

Each tuple returned by SQL gets converted to an element

Simple grouping is fine (e.g., books with multiple authors)

Tuples can be returned by SQL in sorted order; adjacent tuples are grouped into an element

Complex results are problematic (e.g., books with multiple

authors and multiple references)

One SQL query returns one table whose columns cannot store sets Option 1: return one table with all combinations of authors and references → bad Option 2: return two tables, one with authors and the other with references → join is done as post processing Option 3: return one table with all author and reference columns; pad with NULL’s; order determines grouping → messy

24

Comparison of approaches

Schema-oblivious

Flexible and adaptable; no DTD needed Queries are easy to formulate

Translation can be easily automated Queries involve lots of join and are expensive

Schema-aware

Less flexible and adaptable Need to know DTD to design the relational schema Query formulation requires knowing DTD and schema Queries are more efficient XQuery is tougher to formulate because of result restructuring

XML-Relational Mapping: Approaches, Techniques, and Queries, Slides of Introduction to Database Management Systems

Related documents

Partial preview of the text

Download XML-Relational Mapping: Approaches, Techniques, and Queries and more Slides Introduction to Database Management Systems in PDF only on Docsity!

XML-Relational Mapping

CPS 116

Introduction to Database Systems

Announcements

 Homework #3 due next Tuesday (Nov. 2)

Approaches to XML processing

 Text files (!)

 Specialized XML DBMS

 Lore (Stanford), Strudel (AT&T), Tamino/QuiP

(Software AG), X-Hive, Timber (Michigan), etc.

 Still a long way to go

 Object-oriented DBMS

 eXcelon (ObjectStore), ozone, etc.

 Not as mature as relational DBMS

 Relational (and object-relational) DBMS

 Middleware and/or object-relational extensions

Mapping XML to relational

 Store XML in a CLOB (Character Large OBject) column

 Alternatives?

Node/edge-based: schema

 Element ( eid , tag )

 Attribute ( eid , attrName , attrValue )

 ElementChild ( eid , pos , child )

 Text ( tid , value )

) Need to “invent” lots of id ’s

) Need indexes for efficiency, e.g., Element ( tag ), Text ( value )

Key: ( eid , attrName )

Keys: ( eid , pos ), ( child )

Node/edge-based: example

Element

ElementChild

Attribute

Text

Node/edge-based: simple paths

 //title

 //section/title

) Path expression becomes joins!

 Number of joins is proportional to the length of the path

expression

Node/edge-based: more complex paths

Node/edge-based: descendent-or-self

 //book//title

Interval-based: schema

 Element ( left , right , level , tag )

 Attribute ( left , attrName , attrValue )

 Text ( left , level , value )

) Where did ElementChild go?

Interval-based: example

Interval-based: queries

 //section/title

)Path expression becomes “containment” joins!

 //book//title

)No recursion!

Inlining

 An author element just has a PCDATA child

 Instead of using foreign keys

 book_author ( ISBN , author_id )

 author ( id , PCDATA_id )

 PCDATA ( id , value )

 Why not just “inline” the string value inside book?

 book_author ( ISBN , author_PCDATA_value )

 PCDATA table no longer stores author values

More general inlining

 As long as we know the structure of an element and its

number of children (and recursively for all children), we can

inline this element where it appears

 With no inlining at all

 With inlining

Queries

 book ( ISBN , price , title , publisher , year ),

book_author ( ISBN , author ),

book_section ( ISBN , section_pos , section_id ),

section ( id , title , text ), section_section ( id , section_pos , section_id )

Pros and cons of inlining

 Not always applicable

 * and +, recursive schema (e.g., section)

Homework #3 due next Tuesday (Nov. 2)

Text files (!)

Specialized XML DBMS

Lore (Stanford), Strudel (AT&T), Tamino/QuiP

Still a long way to go

Object-oriented DBMS

eXcelon (ObjectStore), ozone, etc.

Not as mature as relational DBMS

Relational (and object-relational) DBMS

Middleware and/or object-relational extensions

Store XML in a CLOB (Character Large OBject) column

Alternatives?

Element ( eid , tag )

Attribute ( eid , attrName , attrValue )

ElementChild ( eid , pos , child )

Text ( tid , value )

//title

//section/title

Number of joins is proportional to the length of the path

//book//title

Element ( left , right , level , tag )

Attribute ( left , attrName , attrValue )

Text ( left , level , value )

//section/title

//book//title

An author element just has a PCDATA child

Instead of using foreign keys

book_author ( ISBN , author_id )

author ( id , PCDATA_id )

PCDATA ( id , value )

Why not just “inline” the string value inside book?

book_author ( ISBN , author_PCDATA_value )

PCDATA table no longer stores author values

As long as we know the structure of an element and its

With no inlining at all

With inlining

book ( ISBN , price , title , publisher , year ),

Not always applicable

* and +, recursive schema (e.g., section)

Fewer joins

More “scattering” (e.g., there is no longer any table

Simple results are fine

Simple grouping is fine (e.g., books with multiple authors)

Complex results are problematic (e.g., books with multiple

Schema-oblivious

Schema-aware