







Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Notes on query optimization in database systems, covering topics such as plan generation and comparison, estimating ios, join algorithms, and index usage. It also discusses factors affecting performance and various optimization techniques.
Typology: Slides
1 / 13
This page cannot be seen from the preview
Don't miss anything!








CS 245 Notes 7 1
Hector Garcia-Molina
CS 245 Notes 7 2
--> Generating and comparing plans Query Generate Plans Pruning x x Estimate Cost Cost Select
Pick Min
CS 245 Notes 7 3
CS 245 Notes 7 4
CS 245 Notes 7 5
CS 245 Notes 7 6
B(R) = # of blocks containing R tuples f(R) = max # of tuples of R per block M = # memory blocks available
2
CS 245 Notes 7 7
B(R) = # of blocks containing R tuples f(R) = max # of tuples of R per block M = # memory blocks available
HT(i) = # levels in index i LB(i) = # of leaf blocks in index i
CS 245 Notes 7 8
Index that allows tuples to be read in an order that corresponds to physical order A
indexA
10 15 17 19 35 37
CS 245 Notes 7 9
R1 R2 S1 S2 R3 R4 S3 S
R1 R2 R3 R4 R5 R5 R7 R
CS 245 Notes 7 10
S(R1) = S(R2) = 1/10 block Memory available = 101 blocks
CS 245 Notes 7 11
S(R1) = S(R2) = 1/10 block Memory available = 101 blocks
Metric: # of IOs (ignoring writing of result) CS 245 Notes 7 12
This may not be the best way to compare
4
CS 245 Notes 7 19
CS 245 Notes 7 20
Algorithm (1) Hash R1 tuples into G buckets (2) Hash R2 tuples into H buckets (3) For i = 0 to k do match tuples in Gi, Hi buckets
CS 245 Notes 7 21
R1 R2 Buckets 2 5 Even 4 4 R1 R 3 12 Odd: 5 3 8 13 9 8 11 14
2 4 8 4 12 8 14
3 5 9 5 3 13 11
CS 245 Notes 7 22
(1) Tuples of relation stored physically together?
(2) Relations sorted by join attribute?
(3) Indexes exist?
CS 245 Notes 7 23
CS 245 Notes 7 24
CS 245 Notes 7 25
CS 245 Notes 7 26
Use our memory (1) Read 100 blocks of R (2) Read all of R2 (using 1 block) + join (3) Repeat until done
CS 245 Notes 7 27
Cost: for each R1 chunk: Read chunk: 1000 IOs Read R2: 5000 IOs 6000
CS 245 Notes 7 28
Cost: for each R1 chunk: Read chunk: 1000 IOs Read R2: 5000 IOs 6000
Total = 10,000 x 6000 = 60,000 IOs 1,
CS 245 Notes 7 29
CS 245 Notes 7 30
Total = 5000 x (1000 + 10,000) = 1000 5 x 11,000 = 55,000 IOs
7
CS 245 Notes 7 37
(ii) Read all chunks + merge + write out
Sorted file Memory Sorted Chunks ... ...
CS 245 Notes 7 38
Cost: Sort Each tuple is read,written, read, written so... Sort cost R1: 4 x 1,000 = 4, Sort cost R2: 4 x 500 = 2,
CS 245 Notes 7 39
R1,R2 contiguous, but unordered
Total cost = sort cost + join cost = 6,000 + 1,500 = 7,500 IOs
CS 245 Notes 7 40
R1,R2 contiguous, but unordered
Total cost = sort cost + join cost = 6,000 + 1,500 = 7,500 IOs
But: Iteration cost = 5, so merge joint does not pay off!
CS 245 Notes 7 41
But say R1 = 10,000 blocks contiguous R2 = 5,000 blocks not ordered
Iterate: 5000 x (100+10,000) = 50 x 10, 100 = 505,000 IOs
Merge join: 5(10,000+5,000) = 75,000 IOs
Merge Join (with sort) WINS! CS 245 Notes 7 42
E.g: Say I have 10 memory blocks 10
...
100 chunks to merge, need R1 100 blocks!
8
CS 245 Notes 7 43
Say k blocks in memory x blocks for relation sort
CS 245 Notes 7 44
Say k blocks in memory x blocks for relation sort
CS 245 Notes 7 45
Say k blocks in memory x blocks for relation sort
so... (x/k) k or k^2 x or k x
CS 245 Notes 7 46
R1 is 1000 blocks, k 31. R2 is 500 blocks, k 22.
Need at least 32 buffers
CS 245 Notes 7 47
Hint: do we really need the fully sorted files? R
R
Join?
sorted runs
CS 245 Notes 7 48
C = Read R1 + write R1 into runs
--> Memory requirement?
10
CS 245 Notes 7 55
Total cost (including probes)
= 500+5000 [Probe + get records] = 500+5000 [0.5+2] uniform assumption = 500+12,500 = 13,000 (case b)
CS 245 Notes 7 56
Total cost (including probes)
= 500+5000 [Probe + get records] = 500+5000 [0.5+2] uniform assumption = 500+12,500 = 13,000 (case b)
For case (c): = 500+5000[0.5 1 + (1/100) 1] = 500+2500+50 = 3050 IOs
CS 245 Notes 7 57
Iterate R2 R1 55,000 (best) Merge Join _______ Sort+ Merge Join _______ R1.C Index _______ R2.C Index _______ Iterate R2 R1 5500 Merge join 1500 Sort+Merge Join 7500 4500 R1.C Index 5500 3050 550 contiguousR2.C Index ________
not contiguous
CS 245 Notes 7 58
... ... 10 blocks
100
CS 245 Notes 7 59
-> Same for R -> Read one R1 bucket; build memory hash table -> Read corresponding R2 bucket + hash probe
R
R
R ... ... memory
Then repeat for all buckets CS 245 Notes 7 60
“Bucketize:” Read R1 + write Read R2 + write Join: Read R1, R
Total cost = 3 x [1000+500] = 4500
11
CS 245 Notes 7 61
“Bucketize:” Read R1 + write Read R2 + write Join: Read R1, R
Total cost = 3 x [1000+500] = 4500 Note: this is an approximation since buckets will vary in size and we have to round up to blocks CS 245 Notes 7 62
Size of R1 bucket = (x/k) k = number of memory buffers x = number of R1 blocks So... (x/k) < k
k > x need: k+1 total memory buffers
CS 245 Notes 7 63
E.g., k’=33 R1 buckets = 31 blocks keep 2 in memory memory G 0 G 1
in
...
31
33-2=
R
called hybrid hash-join CS 245 Notes 7 64
E.g., k’=33 R1 buckets = 31 blocks keep 2 in memory memory G 0 G 1
in
...
31
33-2=
R
Memory use: G0 31 buffers G1 31 buffers Output 33-2 buffers R1 input 1 Total 94 buffers 6 buffers to spare!!
called hybrid hash-join
CS 245 Notes 7 65
Next: Bucketize R
in
...
16
33-2=
R
...
31
33-2=
R2 buckets (^) R1 buckets
CS 245 Notes 7 66
Finally: Join remaining buckets
Gi
out
...
16
33-2=
ans
...
31
33-2=
one full R2bucket R2 buckets (^) R1 buckets
one R1buffer
13
CS 245 Notes 7 73
Iterate 5500 Merge join 1500 Sort+merge joint 7500 R1.C index 5500 550 R2.C index _____ Build R.C index _____ Build S.C index _____ Hash join 4500+ with trick,R1 first 4414 with trick,R2 first _____ Hash join, pointers 1600
contiguous
CS 245 Notes 7 74
CS 245 Notes 7 75
CS 245 Notes 7 76
Later on….
CS 245 Notes 7 77