





Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
The processing of xml queries using both navigational and structural approaches. The lore data model, navigational plans, and the niagara unnest algorithm. It also explores the use of stack-based algorithms and the concept of twig joins. Insights into the advantages and disadvantages of each approach and the importance of choosing the optimal join order.
Typology: Slides
1 / 9
This page cannot be seen from the preview
Don't miss anything!






2
Hardcopy in class or otherwise email please
No class on Tuesday (April 5); will make up during reading period Badrish Chandramouli will give the lecture on Thursday (April 7)
3
Equi-join on id ’s Chasing pointers ≈ index nested-loop joins )“Navigational” approach
“Containment” joins involving left and right Sort-merge joins, zig-zag joins with indexes )“Structural” approach
Lore data model peculiarity: labels on edges instead of labels on nodes Access paths in Lore Base representation: (parent, label) → child Label index: (child, label) → parent Edge index: label → (parent, child) Value index: (value, label) → node Path index: path expression → node
Correspond to the following in a label-on-node model label/value → node (parent, label) → child child → parent
5
Top down: pointer chasing Start with //A, navigate down to //A/B and then to //A/B/C, and then check values of C Bottom up: reverse pointer chasing Start with //C[.=5], navigate up to //B[/C[.=5]] and then to //A[/B/C[.=5]] Hybrid: top down and bottom up, meet in middle Start with //A, navigate down to //A/B Start with //C[.=5], navigate up to //B[/C[.=5]] Intersect B nodes )In general, hybrid can combine multiple top-down and bottom-up plans starting from anywhere in the path expression
6
Which plan is best depends on the size of the intermediate results it generates Choose the optimal join order! Top down and bottom up are essentially index nested-loop joins (“pure” navigation) Hybrid can use any join strategy to combine subplans
Binary containment joins (Al-Khalifa et al., ICDE 2002) Given Alist and Dlist , two lists of elements encoded with ( left , right ), with each list sorted by left Find all pairs of ( a , e ), where a ∈ Alist and e ∈ Dlist , such that a is a parent (or ancestor) of e
Example query processing scenario: //book/author Using an inverted-list index, retrieve the list of book elements sorted by left , and the list of author elements sorted by left Find pairs that actually form parent-child relationships
11
12
a 1 : BeginJoinable = d 1 ; stops at d 4
a 2 : BeginJoinable = d 2 ; stops at d 4
a 3 : BeginJoinable = d 4 ; stops at d 6
a 4 : BeginJoinable = d 6
) Further optimization is possible to avoid unnecessary rescanning; though in general rescanning cannot be avoided
a 1 a 2
a 3 a 4
d 1 d 2 d 3
d 4 d 5 d 6
14
15
Algorithm Stack-Tree-Desc Start with an empty stack Astack
While Astack or Alist or Dlist is not empty: If heads of both Alist and Dlist come after the top of Astack , pop Astack ; Else if the head of Alist is contained by the top of Astack , push it onto Astack and advance Alist ; Else join the head of Dlist with everything on Astack and advance Dlist ;
) Output is ordered by Dlist
An alternative algorithm, Stack-Tree-Anc , orders output by Alist but requires more bookkeeping
Elements in a stack form a containment chain
Specifically, the top one that contains it
20
Handles twigs with no branches q 1 // q 2 //…// qn
Input lists T (^) q 1 , T (^) q 2 , …, T (^) qn and stacks Sq 1 , Sq 2 , …, Sqn While T (^) qn is not empty: Let T (^) qmin be the list whose head has smallest left ; Clean all stacks: pop while top’s right < head ( T (^) qmin ). left ; Push head ( T (^) qmin ) on Sqmin , with pointer to top ( Sparent ( q min) ); If q min is the leaf ( qn ), output results and pop Sqmin ;
Check properties Elements in a stack form a containment chain Each stack element points to the top one in the parent stack that contains it
21
A first cut Decompose a twig into root-to-leaf paths Process each path using PathStack Merge solutions for all paths
Problem: intermediate results may be big
All authors will be returned by PathStack , though only the last one should be in the final result
Do not use PathStack , which generates all solutions Modify PathStack to generate only solutions that are parts of the final result (possible if twig contains only //) Specifically, when pushing h (^) q onto stack Sq , ensure that
23
Example
Desired result: ( A 1 , B 2 , C 2 ), ( A 2 , B 1 , C 1 )
Initial state: all three stacks empty; ready to push one of A 1 , B 1 , C 1 onto a stack
If we want to ensure that non-contributing nodes are never pushed onto the stack, then Cannot decide on A 1 unless we see B 2 and C 2 Cannot decide on B 1 or C 1 unless we see A (^2)
A 1 A 2 B 1 C 1
B 2 C 2
A B C
24
Idea: if there are indexes on input lists ordered by left , use these indexes to skip lists more efficiently
Example: Niagara’s ZigZag join on A//B
After advancing to the second A, use the index on B list to go directly to the first joining B, instead of scanning B list linearly When processing a B, use the index on A list to skip