Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Query Optimization: From Logical Plan to Physical Execution, Slides of Database Management Systems (DBMS)

Duke University Database Management Systems (DBMS)

An in-depth exploration of query optimization in advanced database systems. It covers the process of a sql query's journey through the dbms, including parsing, validation, query optimization, and execution. The text also discusses various techniques for query optimization, such as relational algebra equivalences, query rewrite, and heuristics-based optimization.

Typology: Slides

2011/2012

Uploaded on 01/29/2012

arold 🇺🇸

4.7

(24)

372 documents

1 / 9

This page cannot be seen from the preview

Don't miss anything!

Query Optimization

Part I

CPS 216

Advanced Database Systems

A query’s trip through the DBMS

Parser

Validator

Query optimizer

Query execution engine

Result

SQL query SELECT title, SID

FROM Enroll, Course

WHERE Enroll.CID =

Course.CID;

Parse tree

hSFWi

hselect-listi

hfrom-listi

hwhere-listi

htableihtablei

hQueryi

AND

Enroll Course …

…

Logical plan

πtitle, SID

σEnroll.CID = Course.CID

Enroll Course

Physical plan

PROJECT (title, SID)

MERGE-JOIN (CID)

SCAN (Enroll)

SCAN (Course)

SORT (CID)

Parsing & validation

Parser: SQL →parse tree

Good old lex & yacc

Detect and reject syntax errors

Validator: parse tree →logical plan

Detect and reject semantic errors

• Nonexistent tables/views/columns

• Type mismatches (e.g., AVG(name), name + GPA, Student

UNION Enroll)

Wildcard (SELECT *) and view expansion

Use information stored in system catalog tables (contains

all metadata/schema information)

Discover Slides of Database Management Systems (DBMS) Duke University

Partial preview of the text

Download Query Optimization: From Logical Plan to Physical Execution and more Slides Database Management Systems (DBMS) in PDF only on Docsity!

Query Optimization

Part I

CPS 216

Advanced Database Systems

A query’s trip through the DBMS

Parser

Validator

Query optimizer

Query execution engine

Result

SQL query SELECT title, SID

FROM Enroll, Course WHERE Enroll.CID = Course.CID;

Parse tree

h SFW i h select-list i h from-list i

h where-list i

h table i h table i

h Query i

AND

Enroll Course

Logical plan

π title , SID

σ Enroll. CID = Course. CID

×

Enroll Course

Physical plan

PROJECT ( title , SID )

MERGE-JOIN ( CID )

SCAN ( Enroll )

SCAN ( Course )

SORT ( CID )

Parsing & validation

Parser: SQL → parse tree

Good old lex & yacc

Detect and reject syntax errors

Validator: parse tree → logical plan

Detect and reject semantic errors

Nonexistent tables/views/columns
Type mismatches (e.g., AVG(name), name + GPA, Student

UNION Enroll)

Wildcard (SELECT *) and view expansion

Use information stored in system catalog tables (contains

all metadata/schema information)

Logical plan

A tree whose nodes are logical operators

Often a tree of relational algebra operators

DB2 uses QGM (Query Graph Model)

There are many equivalent logical plans

π title

σ Student. name =“Bart” ∧ Student. SID = Enroll. SID ∧ Enroll. CID = Course.CID

×

Enroll

× Course

Student

An equivalent plan:

π title

Enroll. CID = Course. CID

Enroll

Course

Student

Student. SID = Enroll. SID

σ name = “Bart”

SELECT Course.title FROM Student, Enroll, Course WHERE Student.name = ‘Bart’ AND Student.SID = Enroll.SID AND Enroll.CID = Course.CID;

Query optimization and execution

Recall that a physical plan tells the DBMS query execution

engine how to execute the query

One logical plan can have many possible physical plans (with

equivalent results, but different costs and assumptions)

Query optimizer: one logical plan → “best” physical plan

Query execution engine: physical plan → results

PROJECT ( title )

INDEX-NESTED-LOOP-JOIN ( CID )

Index on Enroll( SID )

Index on Course ( CID )

Index on Student ( name )

INDEX-SCAN ( name = “Bart”)

INDEX-NESTED-LOOP-JOIN ( SID )

PROJECT ( title )

MERGE-JOIN ( CID )

SORT ( CID ) SCAN (Course) MERGE-JOIN ( SID )

SCAN ( Enroll )

SORT ( SID )

SCAN ( Student )

FILTER ( name = “Bart”)

Query optimization

Conceptually

Consider a space of possible plans (next)

Estimate costs of plans in the search space (next Tuesday)

Search through the space for the “best” plan (next Thursday)

Often the goal is not picking the absolute optimum, but

instead avoiding the horrible ones

1 second 1 minute 1 hour

Any of these will do

Heuristics-based query optimization

Start with a logical plan

Push selections/projections down as much as possible

Why?

Why not?

Join smaller relations first, and avoid cross product

Why?

Why not?

Convert the transformed logical plan to a physical

plan (by choosing appropriate physical operators)

SQL query rewrite

More complicated—subqueries and views divide a

query into nested “blocks”

Processing each block separately forces particular join

methods and join order

Even if the plan is optimal for each block, it may not be

optimal for the entire query

Unnest query: convert subqueries/views to joins

) Then we just deal with select-project-join queries

Where the clean rules of relational algebra apply

DB2’s QGM

Leung et al. “Query Rewrite Optimization Rules in IBM DB2 Universal

Database.”

Query Graph Model: DB2’s logical plan language

More high-level than relational algebra

A graph of boxes

Leaf boxes are tables

The standard box is the SELECT box (actually a select-

project-join query block with optional duplicate

elimination)

Other types include GROUPBY (aggregation), UNION,

INTERSECT, EXCEPT

Can always add new types (e.g., OUTERJOIN)

Head: declarative description of the output

Schema: list of output columns

Property: Are output tuples DISTINCT?

Body: how to compute the output

Quantifiers: tuple variables that range over other boxes

F: regular tuple variable, e.g., FROM R AS r
E: existential quantifier, e.g., IN ( subquery ), or = ANY( subquery )
A: universal quantifier, e.g., > ALL( subquery )
S: scalar subquery, e.g., = ( subquery )

Quantifiers are connected a hypergraph

Hyperedges are predicates

Enforce DISTINCT, preserve duplicates, or permit duplicates?

For the output of this box, and for each quantifier

QGM example

SELECT DISTINCT q1.partno, q1.descr, q2.suppno FROM inventory q1, quotations q WHERE q1.partno = q2.partno AND q1.descr = ‘engine’ AND q2.price <= ALL (SELECT q3.price FROM quotations q WHERE q2.partno = q3.partno);

Query rewrite in DB

Goal: make the logical plan as general as possible,

i.e., merge boxes

Rule-based transformations on QGM

Merge subqueries in FROM

Convert E to F (e.g., IN/ANY subqueries to joins)

Convert intersection to join

Convert S to F (i.e., scalar subqueries to joins)

Convert outerjoin to join

Magic (i.e., correlated subqueries to joins)

Another E to F trick

Sometimes an ANY subquery can be turned into an

aggregate subquery without ANY, to improve performance

further

SELECT * FROM Student s

WHERE GPA > ANY

(SELECT GPA FROM Student s

WHERE s2.name = ‘Bart’);

SELECT * FROM Student s

WHERE GPA >

(SELECT MIN(GPA) FROM Student s

WHERE s2.name = ‘Bart’);

Does the same trick apply to ALL?

SELECT * FROM Student s

WHERE GPA > ALL

(SELECT GPA FROM Student s

WHERE s2.name = ‘Bart’);

SELECT * FROM Student s

WHERE GPA >

(SELECT MAX(GPA) FROM Student s

WHERE s2.name = ‘Bart’);

Correlated subqueries

SELECT CID FROM Course

WHERE title LIKE ’CPS%’

AND min_enroll >

(SELECT COUNT(*) FROM Enroll

WHERE Enroll.CID = Course.CID);

Executing correlated subquery is expensive

The subquery is evaluated once for every CPS course

) Decorrelate!

COUNT bug

SELECT CID FROM Course

WHERE title LIKE ’CPS%’

AND min_enroll > (SELECT COUNT(*) FROM Enroll

WHERE Enroll.CID = Course.CID);

SELECT CID

FROM Course, (SELECT CID, COUNT(*) AS cnt

FROM Enroll GROUP BY CID) t

WHERE t.CID = Course.CID AND min_enroll > t.cnt

AND title LIKE ’CPS%’;

First compute the enrollment for all(?) courses

Magic decorrelation

Simple idea

Process the outer query using other predicates

To collect bindings for correlated variables in the subquery

Evaluate the subquery using the bindings collected

It is a join
Once for the entire set of bindings
- Compared to once per binding in the naïve approach

Use the result of the subquery to refine the outer query

Another join

Name “magic” comes from a technique in recursive

processing of Datalog queries

Magic decorrelation example

SELECT CID FROM Course WHERE title LIKE ’CPS%’ AND min_enroll > (SELECT COUNT(*) FROM Enroll WHERE Enroll.CID = Course.CID); CREATE VIEW Supp_Course AS SELECT * FROM Course WHERE title LIKE ’CPS%’;

CREATE VIEW Magic AS SELECT DISTINCT CID FROM Supp_Course;

CREATE VIEW DS AS (SELECT Enroll.CID, COUNT(*) AS cnt FROM Magic, Enroll WHERE Magic.CID = Enroll.CID GROUP BY Enroll.CID) UNION (SELECT Magic.CID, 0 AS cnt FROM Magic WHERE Magic.CID NOT IN (SELECT CID FROM Enroll);

SELECT Supp_Course.CID FROM Supp_Course, DS WHERE Supp_Course.CID = DS.CID AND min_enroll > DS.cnt;

Process the outer query without the subquery

Collect bindings

Evaluate the subquery with bindings

Finally, refine the outer query

Query Optimization: From Logical Plan to Physical Execution, Slides of Database Management Systems (DBMS)

Related documents

Partial preview of the text

Download Query Optimization: From Logical Plan to Physical Execution and more Slides Database Management Systems (DBMS) in PDF only on Docsity!

Query Optimization

Part I

CPS 216

Advanced Database Systems

A query’s trip through the DBMS

Parser

Validator

Query optimizer

Query execution engine

Result

SQL query SELECT title, SID

Parse tree

AND

Logical plan

π title , SID

σ Enroll. CID = Course. CID

×

Enroll Course

Physical plan

SORT ( CID )

Parsing & validation

 Parser: SQL → parse tree

 Good old lex & yacc

 Detect and reject syntax errors

 Validator: parse tree → logical plan

 Detect and reject semantic errors

UNION Enroll)

 Wildcard (SELECT *) and view expansion

 Use information stored in system catalog tables (contains

all metadata/schema information)

Logical plan

 A tree whose nodes are logical operators

 Often a tree of relational algebra operators

 DB2 uses QGM (Query Graph Model)

 There are many equivalent logical plans

π title

σ Student. name =“Bart” ∧ Student. SID = Enroll. SID ∧ Enroll. CID = Course.CID

×

× Course

An equivalent plan:

π title

Enroll. CID = Course. CID

Student. SID = Enroll. SID

σ name = “Bart”

Query optimization and execution

 Recall that a physical plan tells the DBMS query execution

engine how to execute the query

 One logical plan can have many possible physical plans (with

equivalent results, but different costs and assumptions)

 Query optimizer: one logical plan → “best” physical plan

 Query execution engine: physical plan → results

INDEX-NESTED-LOOP-JOIN ( SID )

SORT ( SID )

Query optimization

 Conceptually

 Consider a space of possible plans (next)

 Estimate costs of plans in the search space (next Tuesday)

 Search through the space for the “best” plan (next Thursday)

 Often the goal is not picking the absolute optimum, but

instead avoiding the horrible ones

1 second 1 minute 1 hour

Any of these will do

Heuristics-based query optimization

 Start with a logical plan

 Push selections/projections down as much as possible

 Why?

 Why not?

 Join smaller relations first, and avoid cross product

 Why?

 Why not?

 Convert the transformed logical plan to a physical

plan (by choosing appropriate physical operators)

SQL query rewrite

 More complicated—subqueries and views divide a

query into nested “blocks”

 Processing each block separately forces particular join

Parser: SQL → parse tree

Good old lex & yacc

Detect and reject syntax errors

Validator: parse tree → logical plan

Detect and reject semantic errors

Wildcard (SELECT *) and view expansion

Use information stored in system catalog tables (contains

A tree whose nodes are logical operators

Often a tree of relational algebra operators

DB2 uses QGM (Query Graph Model)

There are many equivalent logical plans

Recall that a physical plan tells the DBMS query execution

One logical plan can have many possible physical plans (with

Query optimizer: one logical plan → “best” physical plan

Query execution engine: physical plan → results

Conceptually

Consider a space of possible plans (next)

Estimate costs of plans in the search space (next Tuesday)

Search through the space for the “best” plan (next Thursday)

Often the goal is not picking the absolute optimum, but

Start with a logical plan

Push selections/projections down as much as possible

Why?

Why not?

Join smaller relations first, and avoid cross product

Why?

Why not?

Convert the transformed logical plan to a physical

More complicated—subqueries and views divide a

Processing each block separately forces particular join

Even if the plan is optimal for each block, it may not be

Unnest query: convert subqueries/views to joins

Where the clean rules of relational algebra apply

Query Graph Model: DB2’s logical plan language

More high-level than relational algebra

A graph of boxes

Leaf boxes are tables

The standard box is the SELECT box (actually a select-

Other types include GROUPBY (aggregation), UNION,

Can always add new types (e.g., OUTERJOIN)

Head: declarative description of the output

Schema: list of output columns

Property: Are output tuples DISTINCT?

Body: how to compute the output

Quantifiers: tuple variables that range over other boxes

Quantifiers are connected a hypergraph

Enforce DISTINCT, preserve duplicates, or permit duplicates?

Goal: make the logical plan as general as possible,

Rule-based transformations on QGM

Merge subqueries in FROM

Convert E to F (e.g., IN/ANY subqueries to joins)

Convert intersection to join

Convert S to F (i.e., scalar subqueries to joins)

Convert outerjoin to join

Magic (i.e., correlated subqueries to joins)

Sometimes an ANY subquery can be turned into an

SELECT * FROM Student s

SELECT * FROM Student s

SELECT * FROM Student s

SELECT * FROM Student s

SELECT CID FROM Course

Executing correlated subquery is expensive

The subquery is evaluated once for every CPS course

SELECT CID FROM Course