Query Optimization in Database Management Systems - Prof. Kristen R. Lefevre, Study Guides, Projects, Research of Database Management Systems (DBMS)

A chapter from the database management systems textbook by kristen lefevre, focusing on query optimization. It covers topics such as query execution life-cycle, query optimization plans, alternative plans, ra equivalence, extended ra, query optimization issues, and system r optimizer. The document also includes examples and announcements.

Typology: Study Guides, Projects, Research

Pre 2010

Uploaded on 09/02/2009

koofers-user-kjt
koofers-user-kjt 🇺🇸

10 documents

1 / 7

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1
3/17/09 EECS 484: Database Man agement Systems, Kristen L eFevre 1
Query Optimization
Chapter 15
3/17/09 EECS 484: Database Man agement Systems, Kristen L eFevre 2
Query Execution Life-Cycle
Query
Query Result
Database Server
Select R.text from
Report R, Weather W
where W.image.rain()
and W.city = R.city
and W.date = R.date
and
R.text.
matches(“insurance claims”)
Query
Syntax Tree
Parser
Query Plan
Optimizer
Segments
Dispatch
Query Plan |…|……|………..|………..|
|…|……|………..|………..|
|…|……|………..|………..|
|…|……|………..|………..|
|…|……|………..|………..|
|…|……|………..|………..|
|…|……|………..|………..|
|…|……|………..|………..|
|…|……|………..|………..|
|…|……|………..|………..|
|…|……|………..|………..|
Query
Result
Execute
Operators
3/17/09 EECS 484: Database Man agement Systems, Kristen L eFevre 3
Query Optimization
Given a SQL query, how do we evaluate it
efficiently?
Query Optimizer
– Important component of
a DBMS
Convert SQL query
blocks
to extended
relational algebra expressions
Enumerate alternative evaluation plans
Choose a plan based on estimated cost
pf3
pf4
pf5

Partial preview of the text

Download Query Optimization in Database Management Systems - Prof. Kristen R. Lefevre and more Study Guides, Projects, Research Database Management Systems (DBMS) in PDF only on Docsity!

3/17/09 EECS 484: Database Management Systems, Kristen LeFevre 1

Query Optimization

Chapter 15

3/17/09 EECS 484: Database Management Systems, Kristen LeFevre 2

Query Execution Life-Cycle

Query Query Result

Database Server

Report R, Weather W where W.image.rain()^ Select R.text from and W.city = R.city and W.date = R.date matches(“ insurance claims^ and^ R.text.” ) Query Syntax Tree Parser Query Plan Optimizer Segments Dispatch Query Plan |…^ |… |… |… |… |… |…^ |^ | | | | | |^ … …^ … … … … … … … … … … … …|^ ………|^ ………| ………| ………| ………| ………| ………..|..|..|..|..|..|..|………………………………………………………. .|. .|. .|. .|. .|. .|. .| |… |… |… |… | | | | … … … … … … … …| ………| ………| ………| ………..|..|..|..|………………………………. .|. .|. .|. .| Query Result Execute Operators 3/17/09 EECS 484: Database Management Systems, Kristen LeFevre 3

Query Optimization

 Given a SQL query, how do we evaluate it

efficiently?

 Query Optimizer – Important component of

a DBMS

 Convert SQL query blocks to extended

relational algebra expressions

 Enumerate alternative evaluation plans

 Choose a plan based on estimated cost

3/17/09 EECS 484: Database Management Systems, Kristen LeFevre 4

Query Evaluation Plan

 Annotated, extended RA tree

 Operator Interface: open(), getNext(),

close()

 Intermediate Results (multiple ops):

 Pipelined: Tuples resulting from one operator

fed directly into the next

 Materialized: Create a temporary table to store

intermediate results

3/17/09 EECS 484: Database Management Systems, Kristen LeFevre 5

Example

SELECT DISTINCT E.ename

FROM Emp E, Dept D

WHERE D.dname = ‘Toy’

AND D.did = E.did

EMP (ssn, ename, addr, sal, did)

DEPT (did, dname, floor, mgr)

ename DEPT EMP dname=‘Toy’ (Clustered Index Scan) (File Scan) (Sort-Merge Join, Dept pre-sorted) (hash-projection, pipelined) (pipelined)

Annotated RA Tree

3/17/09 EECS 484: Database Management Systems, Kristen LeFevre 6

Alternative Plans

 Can be many extended RA trees that

produce the same result!

ename EMP DEPT dname=‘Toy’ ename DEPT EMP dname=‘Toy’

Also, different algorithms for each operator

3/17/09 EECS 484: Database Management Systems, Kristen LeFevre 10

System R Optimizer

 Most widely used currently; works well for < 10 joins

 Cost estimation: Approximate art at best

 Catalog statistics

 cost of operation  result size

 Combination of CPU and I/O costs

 Plan Space:

 Only left-deep plans

 Avoid Cartesian products

3/17/09 EECS 484: Database Management Systems, Kristen LeFevre 11

Cost Estimation

 Estimate cost of each operation in plan tree

 Depends on input cardinalities

 Algorithm cost (see previous lecture)

 Estimate size of result

 Use information about the input relations

 For selections and joins, assume independence of

predicates

 We’ll discuss the System R cost estimation approach

 Very inexact, but works OK in practice

 More sophisticated techniques known now

3/17/09 EECS 484: Database Management Systems, Kristen LeFevre 12

Pricing Plans: Statistics

 Statistics stored in the catalogs

 Relation

 Cardinality (# rows)  Size in pages

 Index

 Cardinality (# distinct keys)  Size in pages  Height  Range

 Catalogs update periodically

 Can be slightly inconsistent

 Commercial systems use histograms

 More accurate estimates

3/17/09 EECS 484: Database Management Systems, Kristen LeFevre 13

Size Estimation and Reduction Factors

Question: What is the cardinality of the result set?

 Max # tuples: product of input relation cardinalities

 Each term “filters” out some tuples: Reduction factor

 Result cardinality = Max # tuples * product of all RF’s.

 Assumption: terms are independent!

 Term col=value RF: 1/NKeys(I), given index I on col

 Term col1=col2 RF: 1/MAX(NKeys(I1), NKeys(I2))

 Term col>value RF: (High(I)-value)/(High(I)-Low(I))

SELECT attribute list FROM relation list WHERE term1 AND ... AND termk 3/17/09 EECS 484: Database Management Systems, Kristen LeFevre 14

Plan Enumeration

 Two main cases:

 Single-relation plans  Multiple-relation plans

 Single-relation plan (no joins). Access Plans:

 file scan  index scan(s): Clustered, Unclustered  More than one index may “match” predicates  e.g. Clustered index I matching one or more selects: Cost: (NPages(I)+NPages(R)) * product of RF’s of matching selects.  Choose the one with the least estimated cost.  Merge/pipeline selection and projection (and aggregation)  RID intersection techniques  Index aggregate evaluation 3/17/09 EECS 484: Database Management Systems, Kristen LeFevre 15

Example

 Index on did:

 Tuples Retrieved: (1/10) * 10,  Clustered index: (1/10) * (100+1,000) pages  Unclustered index: (1/10) * (100+10,000) pages

 Index on sal:

 Clustered index: (200-40)/(200-10) * (100+1,000) pages  Unclustered index: …

 File scan: 1,000 pages

SELECT E.ename

FROM Emp E

WHERE E.did=

AND E.sal > 40K

1,000 data pages, 10K tuples 100 pages in B+-tree

depts: 10

Salary Range: 10K – 200K

EMP (ssn, ename, addr, sal, did)

3/17/09 EECS 484: Database Management Systems, Kristen LeFevre 19

Enumeration of Plans (Contd.)

 ORDER BY, GROUP BY handled as a final step,

 Only “join” relations if there is a connecting join condition

i.e., avoid Cartesian products if possible.

 This approach is still exponential in the # of tables.

3/17/09 EECS 484: Database Management Systems, Kristen LeFevre 20

Summary

 Query optimization critical to the DBMS performance  Helps understand performance impact of database design  Two parts to optimizing a query:  Enumerate alternative plans. (Typically only consider left-deep plans)  Estimate cost of each plan: size of result and cost of algorithm  Key issues: Statistics, indexes, operator implementations.  Single-relation queries: Pick cheapest access plan + interesting order  Multiple-relation queries:  All single-relation plans are first enumerated. Selections/projections considered as early as possible.  For each 1-relation plan, consider all ways of joining another relation (as inner)  Keep adding 1-relation plan until done  At each level, retain cheapest plan, and best plan for each interesting order 3/17/09 EECS 484: Database Management Systems, Kristen LeFevre 21

Announcements

Minirel Project 1, due Monday

Quiz 2 (next Wednesday, last 30 minutes)

Optional Exercises: 12.1 (all parts), 15.1,