





Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
A part of the advanced database systems course (cps 216) and covers query optimization techniques, specifically cost estimation and the use of histograms. Topics include the estimation of costs for different operators, the importance of statistics, and the use of histograms for range queries. The document also discusses the construction and maintenance of histograms, as well as the use of equi-width and equi-height histograms.
Typology: Slides
1 / 9
This page cannot be seen from the preview
Don't miss anything!






2
Please email me to sign up for a 30-minute slot
3
Rewrite logical plan to combine “blocks” as much as possible Each block will then be optimized separately Fewer blocks → larger plan space
Example: SORT( CID ) takes 2 × B ( input )
PROJECT ( title ) MERGE-JOIN ( CID ) SORT ( CID )SCAN (Course) MERGE-JOIN ( SID )
SCAN ( Enroll )
SORT ( SID ) SCAN ( Student )
FILTER ( name = “Bart”)
Input to SORT( CID ):
5
Size of R : | R | For each column A in R , the number of distinct A values: |π A R | Assumption: R. A values are uniformly distributed over π A R (i.e., all values have the same count in R )
6
1 ⁄ |π A R | is the selectivity factor of predicate ( A = v ) )This predicate reduces the size of input table by the selectivity factor
Every row in the “smaller” table (one with fewer distinct values for the join column) joins with some row in the other table That is, if |π B R | · |π B S | then π B R ⊆ π B S Certainly not true in general
Selectivity factor of R. B = S. B is 1 ⁄ max(|π B R |, |π B S |)
11
A non-join attribute does not lose values from its set of possible values That is, if A is in R but not S , then π A ( R S ) = π A R Certainly not true in general
12
R. B = S. B : 1 ⁄ max(|π B R |, |π B S |) S. C = T. C : 1 ⁄ max(|π C S |, |π C T |) | Q | ≈ d (| R | · | S | · | T |) ⁄ (max(|π B R |, |π B S |) · max(|π C S |, |π C T |)) e
Using similar ideas, we can estimate the size of projection, duplicate elimination, union, difference, aggregation (with grouping) Lots of assumptions and very rough estimation Accurate estimate is not needed Maybe okay if we overestimate or underestimate consistently May lead to very nasty optimizer “hints” SELECT * FROM Student WHERE GPA > 3.9; SELECT * FROM Student WHERE GPA > 3.9 AND GPA > 3.9; Next: better estimation using more information (histograms)
14
| R |, |π A R |, high( R. A ), low( R. A )
Partition the domain of R. A into buckets Store a small summary of the distribution within each bucket Number of buckets is the “knob” that controls the resolution
15
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
1 2 2 0 1
(^43)
(^8 8 ) (^6 ) (^533 )
19
27
13
Sort all R. A values, and then take equally spaced splits
Incremental maintenance
20
5 is in bucket [1, 7] (16) Assume uniform distribution within the bucket | Q | ≈ 16 ⁄ 7 ≈ 2 (| Q | = 1, actually)
[7, 16] covers [8, 9], [10, 11], [12, 16] (all with 16) [7, 16] partially covers [1, 7] (16) | Q | ≈ 16 ⁄ 7 + 16 + 16 + 16 ≈ 50 (| Q | = 52, actually)
21
Store the number of distinct values in each bucket To remove the effects of the values with 0 frequency
) More in Poosala et al., SIGMOD 1996
V-optimal( V , F ) histogram Avoid putting very different frequencies into the same bucket Partition in a way to minimize ∑ i VAR (^) i overall, where VARi is the frequency variance within bucket i MaxDiff( V , A ) histogram Define area to be the product of the frequency of a value and its spread Insert bucket boundaries where two adjacent areas differ by large amounts A bit easier to construct than V-optimal; comparable performance
23
Mathematical tool for hierarchical decomposition of functions and signals Haar wavelets: recursive pair-wise averaging and differencing at different resolutions Simplest wavelet basis, easy to implement Resolution Averages Detail coefficients 3 [2, 2, 0, 2, 3, 5, 4, 4] 2 [2, 1, 4, 4] [0, –1, –1, 0] 1 [1.5, 4] [0.5, 0] 0 [2.75] [–1.25] Haar wavelet decomposition: [2.75, –1.25, 0.5, 0, 0, –1, –1, 0]
24
–1.
0.5 0
0 –1 –1 0
+ –
+
+ + + +
**- –
+
2 2 0 2 3 5 4 4 Original data