Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Advanced Database Systems-Lecture 23 Slides-Computer Science, Slides of Database Management Systems (DBMS)

Duke University Database Management Systems (DBMS)

Query Optimization, Advanced Database Systems, Search Space, Left-deep Plans, Heuristic, Greedy Algorithm, Query Optimization in System R, Bottom-up Plan Generation, The Need for “Interesting Order”, Comparing Interesting Orders, System-R Algorithm, Reasoning About Predicates, Bushy Plans, Optimizer “Blow-up”, Transformations, Iterative Improvement, Simulated Annealing, Two-phase Optimization

Typology: Slides

2011/2012

Uploaded on 01/29/2012

arold 🇺🇸

4.7

(24)

372 documents

1 / 9

This page cannot be seen from the preview

Don't miss anything!

Query Optimization

Part III

CPS 216

Advanced Database Systems

Announcements (April 21)

Homework #4 due next Thursday

Classes on both Tuesday and Thursday next week

Project demo period: April 28 – May 1

Remember to email me to sign up for a 30-minute slot

Final exam on Monday, May 2, 2-5pm

3 hours—no time pressure!

Open book, open notes

Comprehensive, but with emphasis on the second half of

the course and materials exercised in homework

Review of the bigger picture

Query optimization

Consider a space of possible plans

Estimate costs of plans in the search space

Search through the space for the “best” plan (today)

)Focus on select-project-join query blocks

Join ordering is the most important subproblem

Discover Slides of Database Management Systems (DBMS) Duke University

Partial preview of the text

Download Advanced Database Systems-Lecture 23 Slides-Computer Science and more Slides Database Management Systems (DBMS) in PDF only on Docsity!

Query Optimization

Part III

CPS 216

Advanced Database Systems

Announcements (April 21)

Homework #4 due next Thursday

Classes on both Tuesday and Thursday next week

Project demo period: April 28 – May 1

Remember to email me to sign up for a 30-minute slot

Final exam on Monday, May 2, 2-5pm

3 hours—no time pressure! Open book, open notes Comprehensive, but with emphasis on the second half of the course and materials exercised in homework

Review of the bigger picture

Query optimization

Consider a space of possible plans

Estimate costs of plans in the search space

Search through the space for the “best” plan (today)

) Focus on select-project-join query blocks

Join ordering is the most important subproblem

Search space

“Bushy” plan example:

Search space is huge: 30240 bushy plans for a six-

table join

More if we consider:

Multiway joins Different join methods Placement of selection and projection operators

R 2 R 1 R 3 R (^4) R 5

Left-deep plans

Heuristic: consider only “left-deep” plans, in which only the left child can be a join

How many left-deep plans are there for R 1 L R (^) n?

R 2 R 1

R 3

R 4

R 5

A greedy algorithm

S 1 , …, Sn Say selections have been pushed down; i.e., Si = σ p Ri Start with the pair Si , Sj with the smallest estimated size for Si Sj

Repeat until no table is left: Pick Sk from the remaining tables such that the join of Sk and the current result yields an intermediate result of the smallest size

Current subplan

…, Sk , Sl , Sm , …

Remaining tables to be joined

Pick most efficient join method

Minimize expected size

Complexity?

Dealing with interesting orders

When picking the best plan

Comparing their costs is not enough

Plans are not totally ordered by cost anymore Comparing interesting orders is also needed
Plans are now partially ordered
Plan X is better than plan Y if
Cost of X is lower than Y
Interesting orders produced by X subsume those produced by Y

Need to keep a set of optimal plans for joining every

combination of k tables

At most one for each interesting order

System-R algorithm

Pass 1: Find the best single-table plans Pass 2: Find the best two-table plans by considering each single-table plan (from Pass 1) as the outer input and every other table as the inner input … Pass k : Find the best k -table plans by considering each ( k –1)-table plan (from Pass k –1) as the outer input and every other table as the inner input …

Heuristics Push selections and projections down Process cross products at the end

Reasoning about predicates

SELECT * FROM R , S , T

WHERE R. A = S. A AND S. A = T. A ;

Looks like a cross product between R and T

No join condition

A good optimizer should be able to detect this case

and consider the possibility of joining R with T first

System-R algorithm example

SELECT SID, CID

FROM Student, Enroll, Course WHERE Student.age < 10 AND Student.SID = Enroll.SID AND Enroll.CID = Course.CID AND Course.title LIKE ‘%data%’;

Primary keys/indexes Student(SID), Enroll(CID, SID), Course(CID) Ordered, secondary indexes Student(age), Course(title)

Example: pass 1

Plans for { Student } S1: Table scan, then filter ( age < 10); cost 100; result ordered by SID S2: Index scan using condition ( age < 10); cost 5; result ordered by age Plans for { Enroll } E1: Table scan; cost 1000; result ordered by CID , SID

Plans for { Course } C1: Table scan, then filter ( title LIKE ’%data%’); cost 40; result ordered by CID C2: Index scan with filter ( title LIKE ’%data%’); cost 60; result ordered by title

SELECT SID, CID FROM Student, Enroll, Course WHERE Student.age < 10 AND Student.SID = Enroll.SID AND Enroll.CID = Course.CID AND Course.title LIKE ‘%data%’;

Example: pass 2

Plans for { Student , Enroll }

Extending best plans for { Student }

From S1 (table scan, then filter ( age < 10))
- Block-based nested loop join with Enroll ; cost 1100
- Sort Enroll by SID , and merge join; cost 3100; ordered by SID ← no longer an interesting order
… …
From S2 (index scan using condition ( age < 10))
- Block-based nested loop join with Enroll ; cost 1005
- … … Extending best plans for { Enroll } … …

☻

SELECT SID, CID FROM Student, Enroll, Course WHERE Student.age < 10 AND Student.SID = Enroll.SID AND Enroll.CID = Course.CID AND Course.title LIKE ‘%data%’;

Optimizer “blow-up”

A 20-way join will easily choke an optimizer using

the System-R algorithm

Solutions

Heuristics-based query optimization Randomized query optimization (Ioannidis & Kang, SIGMOD 1990) Genetic programming (PostgreSQL)

Search space revisited

Cost

Space of plans

Plan Transformations

Global optimum

Local optimum

Transformations

Relational algebra equivalences (or query rewrite rules in general):

Join method choice: R method1 S → R method2 S

Join commutativity: R S → S R

Join associativity: ( R S ) T → R ( S T )

Left join exchange: ( R S ) T → R ( T S )

Right join exchange: R ( S T ) → S ( R T )

) Why the last two redundant rules?

Iterative improvement

Repeat until some stopping condition (e.g., time

runs out):

Start with a random plan Repeatedly go downhill (i.e., pick a neighbor with a lower cost randomly) to get to a local optimum

Return the smallest local optimum found

Simulated annealing

Start with a plan and an initial temperature

Repeat until temperature is 0:

Repeat until some equilibrium (e.g., a fixed number of iterations):

Move to a random neighbor of the plan (an uphill move is allowed with probability e –^ ∆cost^ ⁄^ temperature) - Larger → smaller probability - Lower temperature → smaller probability Reduce temperature

Advanced Database Systems-Lecture 23 Slides-Computer Science, Slides of Database Management Systems (DBMS)

Related documents

Partial preview of the text

Download Advanced Database Systems-Lecture 23 Slides-Computer Science and more Slides Database Management Systems (DBMS) in PDF only on Docsity!

Query Optimization

Part III

CPS 216

Advanced Database Systems

Announcements (April 21)

 Homework #4 due next Thursday

 Classes on both Tuesday and Thursday next week

 Project demo period: April 28 – May 1

 Final exam on Monday, May 2, 2-5pm

Review of the bigger picture

Query optimization

 Consider a space of possible plans

 Estimate costs of plans in the search space

 Search through the space for the “best” plan (today)

) Focus on select-project-join query blocks

Search space

 “Bushy” plan example:

 Search space is huge: 30240 bushy plans for a six-

table join

 More if we consider:

Left-deep plans

R 5

A greedy algorithm

Dealing with interesting orders

 When picking the best plan

 Need to keep a set of optimal plans for joining every

combination of k tables

System-R algorithm

Reasoning about predicates

 SELECT * FROM R , S , T

WHERE R. A = S. A AND S. A = T. A ;

 Looks like a cross product between R and T

 A good optimizer should be able to detect this case

and consider the possibility of joining R with T first

System-R algorithm example

 SELECT SID, CID

Example: pass 1

Example: pass 2

 Plans for { Student , Enroll }

Optimizer “blow-up”

 A 20-way join will easily choke an optimizer using

the System-R algorithm

 Solutions

Search space revisited

Transformations

Iterative improvement

 Repeat until some stopping condition (e.g., time

runs out):

 Return the smallest local optimum found

Simulated annealing

 Start with a plan and an initial temperature

 Repeat until temperature is 0:

 Return the plan visited with the lowest cost

Two-phase optimization

 Phase I: run iterative improvement for a while to

find a good local optimum

 Phase II: run simulated annealing with a low initial

temperature to get more improvements

 Why does this heuristic tend to work better than

both iterative improvement and simulated

annealing?

Homework #4 due next Thursday

Classes on both Tuesday and Thursday next week

Project demo period: April 28 – May 1

Final exam on Monday, May 2, 2-5pm

Consider a space of possible plans

Estimate costs of plans in the search space

Search through the space for the “best” plan (today)

“Bushy” plan example:

Search space is huge: 30240 bushy plans for a six-

More if we consider:

When picking the best plan

Need to keep a set of optimal plans for joining every

SELECT * FROM R , S , T

Looks like a cross product between R and T

A good optimizer should be able to detect this case

SELECT SID, CID

Plans for { Student , Enroll }

A 20-way join will easily choke an optimizer using

Solutions

Repeat until some stopping condition (e.g., time

Return the smallest local optimum found

Start with a plan and an initial temperature

Repeat until temperature is 0:

Return the plan visited with the lowest cost

Phase I: run iterative improvement for a while to

Phase II: run simulated annealing with a low initial

Why does this heuristic tend to work better than