Estimation Rules and Heuristics for Query Optimization in Database Systems, Slides of Database Management Systems (DBMS)

The process of converting sql queries into parse trees, logical query plans, and physical query plans, with a focus on estimating relation sizes and applying algebraic transformations to improve the logical query plan. It also discusses the use of heuristics to reduce the cost of logical query plans and enumerating physical plans, as well as choosing the order for joins and deciding when to materialize or pipeline intermediate results.

Typology: Slides

2012/2013

Uploaded on 04/26/2013

duurga
duurga 🇮🇳

4.6

(25)

121 documents

1 / 41

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Query Compilation
Evaluating Logical Query Plan
Physical Query Plan
1
Docsity.com
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29

Partial preview of the text

Download Estimation Rules and Heuristics for Query Optimization in Database Systems and more Slides Database Management Systems (DBMS) in PDF only on Docsity!

Query Compilation

Evaluating Logical Query Plan

Physical Query Plan

Outline

  • Convert SQL query to a parse tree
    • Semantic checking: attributes, relation names, types
  • Convert to a logical query plan (relational algebra expression) - deal with subqueries
  • Improve the logical query plan
    • use algebraic transformations
    • group together certain operators
    • evaluate logical plan based on estimated size of relations
  • Convert to a physical query plan
    • search the space of physical plans
    • choose order of operations
    • complete the physical query plan

Desiderata for Estimation Rules

  1. Give accurate estimates
  2. Are easy (fast) to compute
  3. Are logically consistent: estimated size should not depend on how the relation is computed

Here describe some simple heuristics.

All we really need is a scheme that properly

ranks competing plans.

Estimating Size of Projection

  • This can be exactly computed
  • Every tuple changes size by a known amount.

Estimating Size of Selection (cont'd)

  • If condition is A < c:
    • a good estimate is T(R)/3; intuition is that usually you ask about something that is true of less than half the tuples
  • If condition is A ≠ c:
    • a good estimate is T(R )
  • If condition is the AND of several equalities and

inequalities, estimate in series.

Example

  • Consider relation R(a,b,c) with 10,000 tuples

and 50 different values for attribute a.

  • Consider selecting all tuples from R with a =

10 and b < 20.

  • Estimate of number of resulting tuples is

Example

  • Consider relation R(a,b) with 10,000 tuples and 50 different values for a.
  • Consider selecting all tuples from R with a = 10 or b <
  • Estimate for a = 10 is 10,000/50 = 200
  • Estimate for b < 20 is 10,000/3 = 3333
  • Estimate for combined condition is
    • 200 + 3333 = 3533 or
    • 10,000(1 − (1 − 1/50)(1 − 1/3)) = 3466

Estimating Size of Natural Join

  • Assume join is on a single attribute Y.
  • Some possibilities:
    1. R and S have disjoint sets of Y values, so size of join is 0
    2. Y is the key of S and a foreign key of R, so size of join is T(R)
    3. All the tuples of R and S have the same Y value, so size of join is T(R)*T(S)
  • We need some assumptions…

Join Estimation Rule

  • Expected number of tuples in result is
    • T(R)*T(S) / max(V(R,Y),V(S,Y))
  • Why? Suppose V(R,Y) ≤ V(S,Y).
    • There are T(R) tuples in R.
    • Each of them has a 1/V(S,Y) chance of joining with a given tuple of S, creating T(S)/V(S,Y) new tuples

Example

  • Suppose we have
    • R(a,b) with T(R) = 1000 and V(R,b) = 20
    • S(b,c) with T(S) = 2000, V(S,b) = 50, and V(S,c) = 100
    • U(c,d) with T(U) = 5000 and V(U,c) = 500
  • What is the estimated size of R S U?
    • First join R and S (on attribute b):
      • estimated size of result, X, is T(R)*T(S)/max(V(R,b),V(S,b)) = 40,
      • by containment of value sets, number of values of c in X is the same as in S, namely 100
    • Then join X with U (on attribute c):
      • estimated size of result is T(X)*T(U)/max(V(X,c),V(U,c)) = 400,

More About Natural Join

  • If there are mutiple join attributes, the previous

rule generalizes:

  • T(R)*T(S) divided by the larger of V(R,y) and V(S,y) for each join attribute y
  • Consider the natural join of a series of relations:
  • containment and preservation of value sets assumptions ensure that the same estimated size is achieved no matter what order the joins are done in

Summary of Estimation Rules

  • Projection: exactly computable
  • Product: exactly computable
  • Selection: reasonable heuristics
  • Join: reasonable heuristics
  • The other operators are harder to estimate…

Estimating Size Parameters

  • Estimating the size of a relation depended on knowing T(R) and V(R,a)'s
  • Estimating cost of a physical algorithm depends on also knowing B(R).
  • How can the query compiler learn them?
    • Scan relation to learn T, V's, and then calculate B
    • Can also keep a histogram of the values of attributes. Makes estimating join results more accurate
    • Recomputed periodically, after some time or some number of updates, or if DB administrator thinks optimizer isn't choosing good plans

Heuristics to Reduce Cost of LQP

  • For each transformation of the tree being considered, estimate the "cost" before and after doing the transformation
  • At this point, "cost" only refers to sizes of intermediate relations (we don't yet know about number of disk I/O's)
  • Sum of sizes of all intermediate relations is the heuristic: if this sum is smaller after the transformation, then incorporate it