





Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Query Processing, Notation, Table Scan, Nested-loop Join, External Merge Sort, Performance of External Merge Sort, Tricks for Sorting, Double Buffering, Blocked I/O, Internal Sort Algorithm, Quicksort, Replacement Selection, Sort-merge Join, Optimization of SMJ, Performance of two-pass SMJ, Hash join, Other Sort-based Algorithms, Partitioning Phase, Probing phase, Hash join Tricks, Hybrid Hash Join, Hash Join versus SMJ, Duality of Sort and Hash, I/O Patterns
Typology: Slides
1 / 9
This page cannot be seen from the preview
Don't miss anything!






2
Variant indexes (due next Monday)
3
Scan? Sort? Hash? Use an index? All with different performance characteristics
Implement all alternatives Let the query optimizer choose at run-time
Number of I/O’s Memory requirement
5
Selection over R Projection of R without duplicate elimination
Trick for selection: stop early if it is a lookup by key
Same for any algorithm! Maybe not needed—results may be pipelined directly into another operator
6
R p S For each block of R , and for each r in the block: For each block of S , and for each s in the block: Output rs if p evaluates to true over r and s R is called the outer table; S is called the inner table I/O’s: B ( R ) + | R | ⋅ B ( S ) Memory requirement: 4 (double buffering)
Improvement: block-based nested-loop join For each block of R , and for each block of S : For each r in the R block, and for each s in the S block: … I/O’s: B ( R ) + B ( R ) ⋅ B ( S ) Memory requirement: same as before
Multiply by 2 ⋅ B ( R ): each pass reads the entire relation once and writes it once Subtract B ( R ) for the final pass Roughly, this is O ( B ( R ) ⋅ log (^) M B ( R ) )
11
Allocate an additional block for each run
Instead of reading/writing one disk block at time, read/write a bunch (“cluster”)
12
)Fast
One block for input, one for output, rest for a heap Fill the heap with input records Find the smallest record in the heap that is no less than the largest record in the current run
In most cases (e.g., join of key and foreign key) Worst case is B ( R ) ⋅ B ( S ): everything joins
14
15
Idea: combine join with the merge phase of merge sort
Sort: produce sorted runs of size M for R and S
Merge and join: merge the runs of R , merge the runs of S , and merge-join the result streams as they are generated!
Merge
Merge Sorted runs
R
S
Disk Memory
Join
M – 1 partitions of R
Memory Disk
R
Same for S
… …
20
Typically build a hash table for the partition of R
R partitions
S partitions
…
…
load …
stream For each S tuple, probe and join
21
In the probing phase, we should have enough memory to fit one partition of R : M – 1 ≥ B ( R ) / ( M – 1) M > sqrt( B ( R )) We can always pick R to be the smaller relation, so: M > sqrt(min( B ( R ), B ( S ))
Read it back in and partition it further!
23
Use it to avoid writing/re-reading partitions
R … …
A generalization of the idea is described in the survey paper by Graefe
24
(Assuming two-pass)
I/O’s: same
Memory requirement: hash join is lower sqrt(min( B ( R ), B ( S )) < sqrt( B ( R ) + B ( S ))
Other factors Hash join performance depends on the quality of the hash