Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

XML-Relational Mapping: Techniques for XML Data in Databases, Slides of Database Management Systems (DBMS)

Duke University Database Management Systems (DBMS)

An overview of xml-relational mapping, a technique used to process xml data in relational databases. Various approaches to xml processing, including text files, specialized xml dbms, object-oriented dbms, and relational dbms. It also discusses mapping xml to relational databases using node/edge-based and interval-based schema, and the advantages and disadvantages of each approach. The document also touches upon xquery and its evaluation in relational databases.

Typology: Slides

2011/2012

Uploaded on 01/29/2012

arold 🇺🇸

4.7

(24)

372 documents

1 / 10

This page cannot be seen from the preview

Don't miss anything!

XML-Relational Mapping

CPS 216

Advanced Database Systems

Announcements (March 18)

Midterm sample solution available outside my office

Course project milestone 2 due March 30

Homework #3 due April 6

Talk by Amol Deshpande

Adaptive Query Processing to Handle Estimation Errors

Monday, 11:30am-12:30pm, D106

Reading assignment due next Monday

Two VLDB papers on native XML databases

Approaches to XML processing

Text files (!)

Specialized XML DBMS

Lore (Stanford), Strudel (AT&T), Tamino/QuiP

(Software AG), X-Hive, Timber (Michigan), etc.

Still a long way to go

Object-oriented DBMS

eXcelon (ObjectStore), ozone, etc.

Not as mature as relational DBMS

Relational (and object-relational) DBMS

Middleware and/or object-relational extensions

Discover Slides of Database Management Systems (DBMS) Duke University

Partial preview of the text

Download XML-Relational Mapping: Techniques for XML Data in Databases and more Slides Database Management Systems (DBMS) in PDF only on Docsity!

XML-Relational Mapping

CPS 216

Advanced Database Systems

Announcements (March 18)

Midterm sample solution available outside my office

Course project milestone 2 due March 30

Homework #3 due April 6

Talk by Amol Deshpande

Adaptive Query Processing to Handle Estimation Errors

Monday, 11:30am-12:30pm, D

Reading assignment due next Monday

Two VLDB papers on native XML databases

Approaches to XML processing

Text files (!)

Specialized XML DBMS

Lore (Stanford), Strudel (AT&T), Tamino/QuiP

(Software AG), X-Hive, Timber (Michigan), etc.

Still a long way to go

Object-oriented DBMS

eXcelon (ObjectStore), ozone, etc.

Not as mature as relational DBMS

Relational (and object-relational) DBMS

Middleware and/or object-relational extensions

Mapping XML to relational

Store XML in a CLOB (Character Large OBject) column

Simple, compact Full-text indexing can help (often provided by DBMS vendors as object-relational “extensions”)

Alternatives?

Schema-oblivious mapping: well-formed XML → generic relational schema

Node/edge-based mapping for graphs
Interval-based mapping for trees
Path-based mapping for trees Schema-aware mapping: valid XML → special relational schema based on DTD

Node/edge-based: schema

Element ( eid , tag )

Attribute ( eid , attrName , attrValue )

Attribute order does not matter

ElementChild ( eid , pos , child )

pos specifies the ordering of children child references either Element ( eid ) or Text ( tid )

Text ( tid , value )

tid cannot be the same as any eid

) Need to “invent” lots of id ’s

) Need indexes for efficiency, e.g., Element ( tag ), Text ( value )

Node/edge-based: example

Foundations of Databases Abiteboul Hull Vianu Addison Wesley 1995 …

eid tag e0 bibliography e1 book e2 title e3 author e4 author e5 author e6 publisher e7 year

Element

eid pos child e0 1 e e1 1 e e1 2 e e1 3 e e1 4 e e1 5 e e1 6 e e2 1 t e3 1 t e4 1 t e5 1 t e6 1 t e7 1 t

ElementChild

eid attrName attrValue e1 ISBN ISBN- e1 price 80

Attribute

tid value t0 Foundations of Databases t1 Abiteboul t2 Hull t3 Vianu t4 Addison Wesley t5 1995

Text

Interval-based: schema

Element ( left , right , level , tag )

left is the start position of the element right is the end position of the element level is the nesting depth of the element (strictly speaking, unnecessary) Key is left

Attribute ( left , attrName , attrValue )

Text ( left , level , value )

) Where did ElementChild go?

Interval-based: example

1 2 34Foundations of Databases 67Abiteboul 910Hull 1213Vianu 1516Addison Wesley 18191995 21… 999 (^) bibliography

book

title author author author publisher year

1,999,

2,21,

3,5,3 6,8,3 9,11,3 12,14,3 15,17,3 18,20,

Interval-based: queries

//section/title

SELECT e2.left FROM Element e1, Element e WHERE e1.tag = ‘section’ AND e2.tag = ‘title’ AND e1.left < e2.left AND e2.right < e1.right AND e1.level = e2.level-1;

)Path expression becomes “containment” joins!

Number of joins is proportional to path expression length

//book//title

SELECT e2.left FROM Element e1, Element e WHERE e1.tag = ‘book’ AND e2.tag = ‘section’ AND e1.left < e2.left AND e2.right < e1.right;

)No recursion!

How about XQuery?

DeHaan et al. SIGMOD 2003

Evaluating an XQuery expression results in a sequence of

environments

An environment E maps each query variable v to its value: a forest of XML trees (a node-set) fv

Encode using tables with “dynamic intervals”

Table I : increasing sequence of integers, one per environment For each query variable v , create a table Tv ( s (tring) , l (eft) , r (ight) ) representing the value of v in all environments

Sorted on l to support efficient processing
Different environments form non-overlapping regions

Example T v

Translating /

Given Tv for values of v , compute v /name

A path-based mapping

Label-path encoding

Element ( pathid , left , right , value ), Path ( pathid , path )

path is a label path starting from the root

Why are left and right still needed?

pathid left right … 1 1 999 … 2 2 21 … 3 3 5 … 4 6 8 … 4 9 11 … 4 12 14 … … … … …

Element

pathid path 1 /bibliography 2 /bibliography/book 3 /bibliography/book/title 4 /bibliography/book/author … …

Path

Label-path encoding: queries

Simple path expressions with no conditions

//book//title

Perform string matching on Path Join qualified pathid ’s with Element

Path expression with attached conditions need to be broken

down, processed separately, and joined back

//book[publisher=‘Prentice Hall’]/title

Evaluate //book Evaluate //book/title Evaluate //book/publisher[text()=‘Prentice Hall’] Join to ensure title and publisher belong to the same book

Another path-based mapping

Dewey-order encoding

Each component of the id represents the order of the

child within its parent

Unlike label-path, this encoding is “lossless”

bibliography

book

title author author author publisher year

1.1.1 1.1.2 1.1.3 1.1.4 1.1.5 1.1.

Dewey-order encoding: queries

Examples:

//title

//section/title

//book//title

//book[publisher=‘Prentice Hall’]/title

Schema-aware mapping

Idea: use DTD to design a better schema

Basic approach: elements of the same type go into one table

Tag name → table name Attributes → columns

If one exists, ID attribute → key column; otherwise, need to “invent” a key
IDREF attribute → foreign key column Children of the element → foreign key columns
Ordering of columns encodes ordering of children

… ]>

book ( ISBN , price , title_id , …) title ( id , PCDATA_id ) PCDATA ( id , value )

Handling * and + in DTD

What if an element can have any number of children?

Example: Book can have multiple authors

book ( ISBN , price , title_id , author_id , publisher_id , year_id )? )BCNF?

Idea: create another table to track such relationships

book ( ISBN , price , title_id , publisher_id , year_id ) book_author ( ISBN , author_id ) )BCNF decomposition in action! )A further optimization: merge book_author into author

Need to add position information if ordering is important

book_author ( ISBN , author_pos , author_id )

Pros and cons of inlining

Not always applicable

Result restructuring

Simple results are fine

Each tuple returned by SQL gets converted to an element

Simple grouping is fine (e.g., books with multiple authors)

Tuples can be returned by SQL in sorted order; adjacent tuples are grouped into an element

Complex results are problematic: one SQL query only

returns a single table; columns cannot contains sets or

structures

E.g., books with multiple authors and multiple references

Option 1: one table with all combo of authors/references → bad
Option 2: two tables, one w/ authors and the other w/ references → join is done as post processing
Option 3: sorted “union” of NULL-padded authors and references

Comparison of approaches

Schema-oblivious

Flexible and adaptable; no DTD needed Queries are easy to formulate

Translation from Xpath/XQuery can be easily automated Queries involve lots of join and are expensive

Schema-aware

Less flexible and adaptable Need to know DTD to design the relational schema Query formulation requires knowing DTD and schema Queries are more efficient XQuery is tougher to formulate because of result restructuring

XML-Relational Mapping: Techniques for XML Data in Databases, Slides of Database Management Systems (DBMS)

Related documents

Partial preview of the text

Download XML-Relational Mapping: Techniques for XML Data in Databases and more Slides Database Management Systems (DBMS) in PDF only on Docsity!

XML-Relational Mapping

CPS 216

Advanced Database Systems

Announcements (March 18)

 Midterm sample solution available outside my office

 Course project milestone 2 due March 30

 Homework #3 due April 6

 Talk by Amol Deshpande

 Adaptive Query Processing to Handle Estimation Errors

 Monday, 11:30am-12:30pm, D

 Reading assignment due next Monday

 Two VLDB papers on native XML databases

Approaches to XML processing

 Text files (!)

 Specialized XML DBMS

 Lore (Stanford), Strudel (AT&T), Tamino/QuiP

(Software AG), X-Hive, Timber (Michigan), etc.

 Still a long way to go

 Object-oriented DBMS

 eXcelon (ObjectStore), ozone, etc.

 Not as mature as relational DBMS

 Relational (and object-relational) DBMS

 Middleware and/or object-relational extensions

Mapping XML to relational

 Store XML in a CLOB (Character Large OBject) column

 Alternatives?

Node/edge-based: schema

 Element ( eid , tag )

 Attribute ( eid , attrName , attrValue )

 ElementChild ( eid , pos , child )

 Text ( tid , value )

) Need to “invent” lots of id ’s

) Need indexes for efficiency, e.g., Element ( tag ), Text ( value )

Node/edge-based: example

Element

ElementChild

Attribute

Text

Interval-based: schema

 Element ( left , right , level , tag )

 Attribute ( left , attrName , attrValue )

 Text ( left , level , value )

) Where did ElementChild go?

Interval-based: example

Interval-based: queries

 //section/title

)Path expression becomes “containment” joins!

 //book//title

)No recursion!

How about XQuery?

DeHaan et al. SIGMOD 2003

 Evaluating an XQuery expression results in a sequence of

environments

 Encode using tables with “dynamic intervals”

Example T v

Translating /

 Given Tv for values of v , compute v /name

A path-based mapping

Label-path encoding

 Element ( pathid , left , right , value ), Path ( pathid , path )

 path is a label path starting from the root

 Why are left and right still needed?

Element

Path

Label-path encoding: queries

 Simple path expressions with no conditions

//book//title

 Path expression with attached conditions need to be broken

down, processed separately, and joined back

//book[publisher=‘Prentice Hall’]/title

Another path-based mapping

Dewey-order encoding

 Each component of the id represents the order of the

child within its parent

 Unlike label-path, this encoding is “lossless”

Dewey-order encoding: queries

Midterm sample solution available outside my office

Course project milestone 2 due March 30

Homework #3 due April 6

Talk by Amol Deshpande

Adaptive Query Processing to Handle Estimation Errors

Monday, 11:30am-12:30pm, D

Reading assignment due next Monday

Two VLDB papers on native XML databases

Text files (!)

Specialized XML DBMS

Lore (Stanford), Strudel (AT&T), Tamino/QuiP

Still a long way to go

Object-oriented DBMS

eXcelon (ObjectStore), ozone, etc.

Not as mature as relational DBMS

Relational (and object-relational) DBMS

Middleware and/or object-relational extensions

Store XML in a CLOB (Character Large OBject) column

Alternatives?

Element ( eid , tag )

Attribute ( eid , attrName , attrValue )

ElementChild ( eid , pos , child )

Text ( tid , value )

Element ( left , right , level , tag )

Attribute ( left , attrName , attrValue )

Text ( left , level , value )

//section/title

//book//title

Evaluating an XQuery expression results in a sequence of

Encode using tables with “dynamic intervals”

Given Tv for values of v , compute v /name

Element ( pathid , left , right , value ), Path ( pathid , path )

path is a label path starting from the root

Why are left and right still needed?

Simple path expressions with no conditions

Path expression with attached conditions need to be broken

Each component of the id represents the order of the

Unlike label-path, this encoding is “lossless”

Examples:

Idea: use DTD to design a better schema

Basic approach: elements of the same type go into one table

What if an element can have any number of children?

Example: Book can have multiple authors

Idea: create another table to track such relationships

Need to add position information if ordering is important

Not always applicable

Simple results are fine

Simple grouping is fine (e.g., books with multiple authors)

Complex results are problematic: one SQL query only

Schema-oblivious

Schema-aware