XML-Relational Mapping: Techniques for XML Data in Databases, Slides of Database Management Systems (DBMS)

An overview of xml-relational mapping, a technique used to process xml data in relational databases. Various approaches to xml processing, including text files, specialized xml dbms, object-oriented dbms, and relational dbms. It also discusses mapping xml to relational databases using node/edge-based and interval-based schema, and the advantages and disadvantages of each approach. The document also touches upon xquery and its evaluation in relational databases.

Typology: Slides

2011/2012

Uploaded on 01/29/2012

arold
arold 🇺🇸

4.7

(24)

372 documents

1 / 10

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1
XML-Relational Mapping
CPS 216
Advanced Database Systems
2
Announcements (March 18)
Midterm sample solution available outside my office
Course project milestone 2 due March 30
Homework #3 due April 6
Talk by Amol Deshpande
Adaptive Query Processing to Handle Estimation Errors
Monday, 11:30am-12:30pm, D106
Reading assignment due next Monday
Two VLDB papers on native XML databases
3
Approaches to XML processing
Text files (!)
Specialized XML DBMS
Lore (Stanford), Strudel (AT&T), Tamino/QuiP
(Software AG), X-Hive, Timber (Michigan), etc.
Still a long way to go
Object-oriented DBMS
eXcelon (ObjectStore), ozone, etc.
Not as mature as relational DBMS
Relational (and object-relational) DBMS
Middleware and/or object-relational extensions
pf3
pf4
pf5
pf8
pf9
pfa

Partial preview of the text

Download XML-Relational Mapping: Techniques for XML Data in Databases and more Slides Database Management Systems (DBMS) in PDF only on Docsity!

XML-Relational Mapping

CPS 216

Advanced Database Systems

2

Announcements (March 18)

™ Midterm sample solution available outside my office

™ Course project milestone 2 due March 30

™ Homework #3 due April 6

™ Talk by Amol Deshpande

ƒ Adaptive Query Processing to Handle Estimation Errors

ƒ Monday, 11:30am-12:30pm, D

™ Reading assignment due next Monday

ƒ Two VLDB papers on native XML databases

3

Approaches to XML processing

™ Text files (!)

™ Specialized XML DBMS

ƒ Lore (Stanford), Strudel (AT&T), Tamino/QuiP

(Software AG), X-Hive, Timber (Michigan), etc.

ƒ Still a long way to go

™ Object-oriented DBMS

ƒ eXcelon (ObjectStore), ozone, etc.

ƒ Not as mature as relational DBMS

™ Relational (and object-relational) DBMS

ƒ Middleware and/or object-relational extensions

Mapping XML to relational

™ Store XML in a CLOB (Character Large OBject) column

ƒ Simple, compact ƒ Full-text indexing can help (often provided by DBMS vendors as object-relational “extensions”)

™ Alternatives?

ƒ Schema-oblivious mapping: well-formed XML → generic relational schema

  • Node/edge-based mapping for graphs
  • Interval-based mapping for trees
  • Path-based mapping for trees ƒ Schema-aware mapping: valid XML → special relational schema based on DTD

5

Node/edge-based: schema

™ Element ( eid , tag )

™ Attribute ( eid , attrName , attrValue )

ƒ Attribute order does not matter

™ ElementChild ( eid , pos , child )

ƒ pos specifies the ordering of children ƒ child references either Element ( eid ) or Text ( tid )

™ Text ( tid , value )

ƒ tid cannot be the same as any eid

) Need to “invent” lots of id ’s

) Need indexes for efficiency, e.g., Element ( tag ), Text ( value )

6

Node/edge-based: example

Foundations of Databases Abiteboul Hull Vianu Addison Wesley 1995

eid tag e0 bibliography e1 book e2 title e3 author e4 author e5 author e6 publisher e7 year

Element

eid pos child e0 1 e e1 1 e e1 2 e e1 3 e e1 4 e e1 5 e e1 6 e e2 1 t e3 1 t e4 1 t e5 1 t e6 1 t e7 1 t

ElementChild

eid attrName attrValue e1 ISBN ISBN- e1 price 80

Attribute

tid value t0 Foundations of Databases t1 Abiteboul t2 Hull t3 Vianu t4 Addison Wesley t5 1995

Text

Interval-based: schema

™ Element ( left , right , level , tag )

ƒ left is the start position of the element ƒ right is the end position of the element ƒ level is the nesting depth of the element (strictly speaking, unnecessary) ƒ Key is left

™ Attribute ( left , attrName , attrValue )

™ Text ( left , level , value )

) Where did ElementChild go?

11

Interval-based: example

1 2 34Foundations of Databases 67Abiteboul 910Hull 1213Vianu 1516Addison Wesley 18191995 21… 999 (^) bibliography

book

title author author author publisher year

1,999,

2,21,

3,5,3 6,8,3 9,11,3 12,14,3 15,17,3 18,20,

12

Interval-based: queries

™ //section/title

ƒ SELECT e2.left FROM Element e1, Element e WHERE e1.tag = ‘section’ AND e2.tag = ‘title’ AND e1.left < e2.left AND e2.right < e1.right AND e1.level = e2.level-1;

)Path expression becomes “containment” joins!

  • Number of joins is proportional to path expression length

™ //book//title

ƒ SELECT e2.left FROM Element e1, Element e WHERE e1.tag = ‘book’ AND e2.tag = ‘section’ AND e1.left < e2.left AND e2.right < e1.right;

)No recursion!

How about XQuery?

DeHaan et al. SIGMOD 2003

™ Evaluating an XQuery expression results in a sequence of

environments

ƒ An environment E maps each query variable v to its value: a forest of XML trees (a node-set) fv

™ Encode using tables with “dynamic intervals”

ƒ Table I : increasing sequence of integers, one per environment ƒ For each query variable v , create a table Tv ( s (tring) , l (eft) , r (ight) ) representing the value of v in all environments

  • Sorted on l to support efficient processing
  • Different environments form non-overlapping regions

14

Example T v

15

Translating /

™ Given Tv for values of v , compute v /name

A path-based mapping

Label-path encoding

™ Element ( pathid , left , right , value ), Path ( pathid , path )

ƒ path is a label path starting from the root

ƒ Why are left and right still needed?

pathid left right … 1 1 999 … 2 2 21 … 3 3 5 … 4 6 8 … 4 9 11 … 4 12 14 … … … … …

Element

pathid path 1 /bibliography 2 /bibliography/book 3 /bibliography/book/title 4 /bibliography/book/author … …

Path

20

Label-path encoding: queries

™ Simple path expressions with no conditions

//book//title

ƒ Perform string matching on Path ƒ Join qualified pathid ’s with Element

™ Path expression with attached conditions need to be broken

down, processed separately, and joined back

//book[publisher=‘Prentice Hall’]/title

ƒ Evaluate //book ƒ Evaluate //book/title ƒ Evaluate //book/publisher[text()=‘Prentice Hall’] ƒ Join to ensure title and publisher belong to the same book

21

Another path-based mapping

Dewey-order encoding

™ Each component of the id represents the order of the

child within its parent

ƒ Unlike label-path, this encoding is “lossless”

bibliography

book

title author author author publisher year

1

1.1.1 1.1.2 1.1.3 1.1.4 1.1.5 1.1.

Dewey-order encoding: queries

™ Examples:

//title

//section/title

//book//title

//book[publisher=‘Prentice Hall’]/title

23

Schema-aware mapping

™ Idea: use DTD to design a better schema

™ Basic approach: elements of the same type go into one table

ƒ Tag name → table name ƒ Attributes → columns

  • If one exists, ID attribute → key column; otherwise, need to “invent” a key
  • IDREF attribute → foreign key column ƒ Children of the element → foreign key columns
  • Ordering of columns encodes ordering of children
… ]>

book ( ISBN , price , title_id , …) title ( id , PCDATA_id ) PCDATA ( id , value )

24

Handling * and + in DTD

™ What if an element can have any number of children?

™ Example: Book can have multiple authors

ƒ book ( ISBN , price , title_id , author_id , publisher_id , year_id )? )BCNF?

™ Idea: create another table to track such relationships

ƒ book ( ISBN , price , title_id , publisher_id , year_id ) ƒ book_author ( ISBN , author_id ) )BCNF decomposition in action! )A further optimization: merge book_author into author

™ Need to add position information if ordering is important

ƒ book_author ( ISBN , author_pos , author_id )

Pros and cons of inlining

™ Not always applicable

29

Result restructuring

™ Simple results are fine

ƒ Each tuple returned by SQL gets converted to an element

™ Simple grouping is fine (e.g., books with multiple authors)

ƒ Tuples can be returned by SQL in sorted order; adjacent tuples are grouped into an element

™ Complex results are problematic: one SQL query only

returns a single table; columns cannot contains sets or

structures

ƒ E.g., books with multiple authors and multiple references

  • Option 1: one table with all combo of authors/references → bad
  • Option 2: two tables, one w/ authors and the other w/ references → join is done as post processing
  • Option 3: sorted “union” of NULL-padded authors and references

30

Comparison of approaches

™ Schema-oblivious

ƒ Flexible and adaptable; no DTD needed ƒ Queries are easy to formulate

  • Translation from Xpath/XQuery can be easily automated ƒ Queries involve lots of join and are expensive

™ Schema-aware

ƒ Less flexible and adaptable ƒ Need to know DTD to design the relational schema ƒ Query formulation requires knowing DTD and schema ƒ Queries are more efficient ƒ XQuery is tougher to formulate because of result restructuring