Advanced Database Systems-Lecture 23 Slides-Computer Science, Slides of Database Management Systems (DBMS)

Query Optimization, Advanced Database Systems, Search Space, Left-deep Plans, Heuristic, Greedy Algorithm, Query Optimization in System R, Bottom-up Plan Generation, The Need for “Interesting Order”, Comparing Interesting Orders, System-R Algorithm, Reasoning About Predicates, Bushy Plans, Optimizer “Blow-up”, Transformations, Iterative Improvement, Simulated Annealing, Two-phase Optimization

Typology: Slides

2011/2012

Uploaded on 01/29/2012

arold
arold 🇺🇸

4.7

(24)

372 documents

1 / 9

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1
Query Optimization
Part III
CPS 216
Advanced Database Systems
2
Announcements (April 21)
Homework #4 due next Thursday
Classes on both Tuesday and Thursday next week
Project demo period: April 28 – May 1
Remember to email me to sign up for a 30-minute slot
Final exam on Monday, May 2, 2-5pm
3 hours—no time pressure!
Open book, open notes
Comprehensive, but with emphasis on the second half of
the course and materials exercised in homework
3
Review of the bigger picture
Query optimization
Consider a space of possible plans
Estimate costs of plans in the search space
Search through the space for the “best” plan (today)
)Focus on select-project-join query blocks
Join ordering is the most important subproblem
pf3
pf4
pf5
pf8
pf9

Partial preview of the text

Download Advanced Database Systems-Lecture 23 Slides-Computer Science and more Slides Database Management Systems (DBMS) in PDF only on Docsity!

Query Optimization

Part III

CPS 216

Advanced Database Systems

2

Announcements (April 21)

™ Homework #4 due next Thursday

™ Classes on both Tuesday and Thursday next week

™ Project demo period: April 28 – May 1

ƒ Remember to email me to sign up for a 30-minute slot

™ Final exam on Monday, May 2, 2-5pm

ƒ 3 hours—no time pressure! ƒ Open book, open notes ƒ Comprehensive, but with emphasis on the second half of the course and materials exercised in homework

3

Review of the bigger picture

Query optimization

™ Consider a space of possible plans

™ Estimate costs of plans in the search space

™ Search through the space for the “best” plan (today)

) Focus on select-project-join query blocks

ƒ Join ordering is the most important subproblem

Search space

™ “Bushy” plan example:

™ Search space is huge: 30240 bushy plans for a six-

table join

™ More if we consider:

ƒ Multiway joins ƒ Different join methods ƒ Placement of selection and projection operators

R 2 R 1 R 3 R (^4) R 5

5

Left-deep plans

™ Heuristic: consider only “left-deep” plans, in which only the left child can be a join

™ How many left-deep plans are there for R 1  L  R (^) n?

R 2 R 1

R 3

R 4

 R 5

6

A greedy algorithm

™ S 1 , …, Sn ƒ Say selections have been pushed down; i.e., Si = σ p Ri ™ Start with the pair Si , Sj with the smallest estimated size for Si  Sj

™ Repeat until no table is left: Pick Sk from the remaining tables such that the join of Sk and the current result yields an intermediate result of the smallest size

Current subplan

…, Sk , Sl , Sm , …

Remaining tables to be joined

Pick most efficient join method

Sk

Minimize expected size

Complexity?

Dealing with interesting orders

™ When picking the best plan

ƒ Comparing their costs is not enough

  • Plans are not totally ordered by cost anymore ƒ Comparing interesting orders is also needed
  • Plans are now partially ordered
  • Plan X is better than plan Y if
  • Cost of X is lower than Y
  • Interesting orders produced by X subsume those produced by Y

™ Need to keep a set of optimal plans for joining every

combination of k tables

ƒ At most one for each interesting order

11

System-R algorithm

™ Pass 1: Find the best single-table plans ™ Pass 2: Find the best two-table plans by considering each single-table plan (from Pass 1) as the outer input and every other table as the inner input … ™ Pass k : Find the best k -table plans by considering each ( k –1)-table plan (from Pass k –1) as the outer input and every other table as the inner input …

™ Heuristics ƒ Push selections and projections down ƒ Process cross products at the end

12

Reasoning about predicates

™ SELECT * FROM R , S , T

WHERE R. A = S. A AND S. A = T. A ;

™ Looks like a cross product between R and T

ƒ No join condition

™

™ A good optimizer should be able to detect this case

and consider the possibility of joining R with T first

System-R algorithm example

™ SELECT SID, CID

FROM Student, Enroll, Course WHERE Student.age < 10 AND Student.SID = Enroll.SID AND Enroll.CID = Course.CID AND Course.title LIKE ‘%data%’;

™ Primary keys/indexes ƒ Student(SID), Enroll(CID, SID), Course(CID) ™ Ordered, secondary indexes ƒ Student(age), Course(title)

14

Example: pass 1

™ Plans for { Student } ƒ S1: Table scan, then filter ( age < 10); cost 100; result ordered by SID ƒ S2: Index scan using condition ( age < 10); cost 5; result ordered by age ™ Plans for { Enroll } ƒ E1: Table scan; cost 1000; result ordered by CID , SID

™ Plans for { Course } ƒ C1: Table scan, then filter ( title LIKE ’%data%’); cost 40; result ordered by CID ƒ C2: Index scan with filter ( title LIKE ’%data%’); cost 60; result ordered by title

SELECT SID, CID FROM Student, Enroll, Course WHERE Student.age < 10 AND Student.SID = Enroll.SID AND Enroll.CID = Course.CID AND Course.title LIKE ‘%data%’;

15

Example: pass 2

™ Plans for { Student , Enroll }

ƒ Extending best plans for { Student }

  • From S1 (table scan, then filter ( age < 10))
    • Block-based nested loop join with Enroll ; cost 1100
    • Sort Enroll by SID , and merge join; cost 3100; ordered by SID ← no longer an interesting order
  • … …
  • From S2 (index scan using condition ( age < 10))
    • Block-based nested loop join with Enroll ; cost 1005
    • … … ƒ Extending best plans for { Enroll } … …

SELECT SID, CID FROM Student, Enroll, Course WHERE Student.age < 10 AND Student.SID = Enroll.SID AND Enroll.CID = Course.CID AND Course.title LIKE ‘%data%’;

Optimizer “blow-up”

™ A 20-way join will easily choke an optimizer using

the System-R algorithm

™ Solutions

ƒ Heuristics-based query optimization ƒ Randomized query optimization (Ioannidis & Kang, SIGMOD 1990) ƒ Genetic programming (PostgreSQL)

20

Search space revisited

Cost

Space of plans

Plan Transformations

Global optimum

Local optimum

21

Transformations

Relational algebra equivalences (or query rewrite rules in general):

™ Join method choice: R method1 SR method2 S

™ Join commutativity: R  SS  R

™ Join associativity: ( R  S )  TR  ( S  T )

™ Left join exchange: ( R  S )  TR  ( T  S )

™ Right join exchange: R  ( S  T ) → S  ( R  T )

) Why the last two redundant rules?

Iterative improvement

™ Repeat until some stopping condition (e.g., time

runs out):

ƒ Start with a random plan ƒ Repeatedly go downhill (i.e., pick a neighbor with a lower cost randomly) to get to a local optimum

™ Return the smallest local optimum found

23

Simulated annealing

™ Start with a plan and an initial temperature

™ Repeat until temperature is 0:

ƒ Repeat until some equilibrium (e.g., a fixed number of iterations):

  • Move to a random neighbor of the plan (an uphill move is allowed with probability e –^ ∆cost^ ⁄^ temperature) - Larger → smaller probability - Lower temperature → smaller probability ƒ Reduce temperature

™ Return the plan visited with the lowest cost

24

Two-phase optimization

™ Phase I: run iterative improvement for a while to

find a good local optimum

™ Phase II: run simulated annealing with a low initial

temperature to get more improvements

™ Why does this heuristic tend to work better than

both iterative improvement and simulated

annealing?