XML Query Processing: Navigational and Structural Approaches for Querying XML Data, Slides of Database Management Systems (DBMS)

The processing of xml queries using both navigational and structural approaches. The lore data model, navigational plans, and the niagara unnest algorithm. It also explores the use of stack-based algorithms and the concept of twig joins. Insights into the advantages and disadvantages of each approach and the importance of choosing the optimal join order.

Typology: Slides

2011/2012

Uploaded on 01/29/2012

arold
arold 🇺🇸

4.7

(24)

372 documents

1 / 9

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1
XML Query Processing
CPS 216
Advanced Database Systems
2
Announcements (March 31)
Course project milestone 2 due today
Hardcopy in class or otherwise email please
I will be out of town next week
No class on Tuesday (April 5); will make up during
reading period
Badrish Chandramouli will give the lecture
on Thursday (April 7)
Homework #3 in less than two weeks (April 12)
Reading assignment for next week will be assigned
through email
3
Overview
Recall that XML queries based on path expressions
can be expressed by joins
Node/edge-based representation (graphs)
Equi-join on id’s
Chasing pointers index nested-loop joins
)“Navigational” approach
Interval-based representation (trees)
“Containment” joins involving left and right
Sort-merge joins, zig-zag joins with indexes
)“Structural” approach
pf3
pf4
pf5
pf8
pf9

Partial preview of the text

Download XML Query Processing: Navigational and Structural Approaches for Querying XML Data and more Slides Database Management Systems (DBMS) in PDF only on Docsity!

XML Query Processing

CPS 216

Advanced Database Systems

2

Announcements (March 31)

™ Course project milestone 2 due today

ƒ Hardcopy in class or otherwise email please

™ I will be out of town next week

ƒ No class on Tuesday (April 5); will make up during reading period ƒ Badrish Chandramouli will give the lecture on Thursday (April 7)

™ Homework #3 in less than two weeks (April 12)

™ Reading assignment for next week will be assigned

through email

3

Overview

™ Recall that XML queries based on path expressions

can be expressed by joins

™ Node/edge-based representation (graphs)

ƒ Equi-join on id ’s ƒ Chasing pointers ≈ index nested-loop joins )“Navigational” approach

™ Interval-based representation (trees)

ƒ “Containment” joins involving left and right ƒ Sort-merge joins, zig-zag joins with indexes )“Structural” approach

Navigational processing in Lore

VLDB 1999

™ Lore data model peculiarity: labels on edges instead of labels on nodes ™ Access paths in Lore ƒ Base representation: (parent, label) → child ƒ Label index: (child, label) → parent ƒ Edge index: label → (parent, child) ƒ Value index: (value, label) → node ƒ Path index: path expression → node

™ Correspond to the following in a label-on-node model ƒ label/value → node ƒ (parent, label) → child ƒ child → parent

5

Navigational plans in Lore

//A/B/C[.=5]

™ Top down: pointer chasing ƒ Start with //A, navigate down to //A/B and then to //A/B/C, and then check values of C ™ Bottom up: reverse pointer chasing ƒ Start with //C[.=5], navigate up to //B[/C[.=5]] and then to //A[/B/C[.=5]] ™ Hybrid: top down and bottom up, meet in middle ƒ Start with //A, navigate down to //A/B ƒ Start with //C[.=5], navigate up to //B[/C[.=5]] ƒ Intersect B nodes )In general, hybrid can combine multiple top-down and bottom-up plans starting from anywhere in the path expression

6

Comparison of Lore navigational plans

™ Which plan is best depends on the size of the intermediate results it generates ƒ Choose the optimal join order! ™ Top down and bottom up are essentially index nested-loop joins (“pure” navigation) ™ Hybrid can use any join strategy to combine subplans

Structural approach

™ Binary containment joins (Al-Khalifa et al., ICDE 2002) ƒ Given Alist and Dlist , two lists of elements encoded with ( left , right ), with each list sorted by left ƒ Find all pairs of ( a , e ), where aAlist and eDlist , such that a is a parent (or ancestor) of e

™ Example query processing scenario: //book/author ƒ Using an inverted-list index, retrieve the list of book elements sorted by left , and the list of author elements sorted by left ƒ Find pairs that actually form parent-child relationships

11

Tree-based algorithms

Algorithm Tree-Merge-Anc

BeginJoinable = 0;

For each a in Alist :

Start from BeginJoinable and skip Dlist until the

first element with left > a. left ; update BeginJoinable ;

Start from BeginJoinable and join each d from

Dlist with a ; stop at the first d with left > a. right ;

™ An alternative algorithm, Tree-Merge-Desc , uses Dlist

as the outer table instead of Alist , and requires

minor tweaks to conditions

12

Tree-Merge-Anc example

™ a 1 : BeginJoinable = d 1 ; stops at d 4

™ a 2 : BeginJoinable = d 2 ; stops at d 4

™ a 3 : BeginJoinable = d 4 ; stops at d 6

™ a 4 : BeginJoinable = d 6

) Further optimization is possible to avoid unnecessary rescanning; though in general rescanning cannot be avoided

a 1 a 2

a 3 a 4

d 1 d 2 d 3

d 4 d 5 d 6

Worst case of Tree-Merge-Anc

™ Optimal (up to a

constant factor) for //

™ Not optimal for /

14

Worst case of Tree-Merge-Desc

™ Not even optimal

for //

) Problem: linear

access to Alist forces

unnecessary

scanning

) Idea: create another

representation that

corresponds more

closely to a tree

traversal

15

Stack-based algorithms

Algorithm Stack-Tree-Desc Start with an empty stack Astack

While Astack or Alist or Dlist is not empty: If heads of both Alist and Dlist come after the top of Astack , pop Astack ; Else if the head of Alist is contained by the top of Astack , push it onto Astack and advance Alist ; Else join the head of Dlist with everything on Astack and advance Dlist ;

) Output is ordered by Dlist

™ An alternative algorithm, Stack-Tree-Anc , orders output by Alist but requires more bookkeeping

Compact encoding using stacks

™ One stack for each node in the query twig

ƒ Elements in a stack form a containment chain

™ Each stack element points to one in the parent stack

ƒ Specifically, the top one that contains it

20

PathStack

™ Handles twigs with no branches q 1 // q 2 //…// qn

™ Input lists T (^) q 1 , T (^) q 2 , …, T (^) qn and stacks Sq 1 , Sq 2 , …, Sqn ™ While T (^) qn is not empty: Let T (^) qmin be the list whose head has smallest left ; Clean all stacks: pop while top’s right < head ( T (^) qmin ). left ; Push head ( T (^) qmin ) on Sqmin , with pointer to top ( Sparent ( q min) ); If q min is the leaf ( qn ), output results and pop Sqmin ;

™ Check properties ƒ Elements in a stack form a containment chain ƒ Each stack element points to the top one in the parent stack that contains it

21

Extending PathStack to TwigStack

™ A first cut ƒ Decompose a twig into root-to-leaf paths ƒ Process each path using PathStack ƒ Merge solutions for all paths

™ Problem: intermediate results may be big

All authors will be returned by PathStack , though only the last one should be in the final result

TwigStack

™ Generate solutions for each root-to-leaf path

ƒ Do not use PathStack , which generates all solutions ƒ Modify PathStack to generate only solutions that are parts of the final result (possible if twig contains only //) Specifically, when pushing h (^) q onto stack Sq , ensure that

  • h (^) q has a descendent h (^) q’ in the each input list Tq’ where q’ is a child of q
  • Each h (^) q’ recursively satisfies the above property

™ Merge solutions for all paths

23

TwigStack still suboptimal for /

™ Example

™ Desired result: ( A 1 , B 2 , C 2 ), ( A 2 , B 1 , C 1 )

™ Initial state: all three stacks empty; ready to push one of A 1 , B 1 , C 1 onto a stack

™ If we want to ensure that non-contributing nodes are never pushed onto the stack, then ƒ Cannot decide on A 1 unless we see B 2 and C 2 ƒ Cannot decide on B 1 or C 1 unless we see A (^2)

A 1 A 2 B 1 C 1

B 2 C 2

A B C

24

Optimization using an index

™ Idea: if there are indexes on input lists ordered by left , use these indexes to skip lists more efficiently

™ Example: Niagara’s ZigZag join on A//B

ƒ After advancing to the second A, use the index on B list to go directly to the first joining B, instead of scanning B list linearly ƒ When processing a B, use the index on A list to skip