Database Query Optimization: Techniques and Rules, Study notes of Computer Science

An overview of database query optimization, focusing on techniques and rules such as algebraic rewriting, transformation, heuristics-based optimization, and cost-based optimization. Concepts like cascade of selections, commutativity of binary operators, and commuting selection with binary operations.

Typology: Study notes

Pre 2010

Uploaded on 08/05/2009

koofers-user-pfv
koofers-user-pfv 🇺🇸

10 documents

1 / 22

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1
QueryOpt
© Ling Liu
Query Processing & Optimization
Ling Liu
2
QueryOpt
© Ling Liu
Query Access
application
interfaces
application
programs
database
schema
query
DML
preprocessor Query Processor
DDL
Preprocessor
Database
Manager
object
code of APs
DBMS
File Manager
System Catalog
Data Files
disk
storage
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16

Partial preview of the text

Download Database Query Optimization: Techniques and Rules and more Study notes Computer Science in PDF only on Docsity!

1 © Ling Liu

Query Processing & Optimization

Ling Liu

QueryOpt

Query Access

application interfaces

application programs

database query schema

DML preprocessor Query Processor

DDL Preprocessor

Database Manager object code of APs

DBMS

File Manager

System Catalog

Data Files disk storage

3 © Ling Liu

The System Catalog

„ Store the meta information that describes each

database, including a description of

z conceptual database schema (logical data model) Æ Relations, Attributes, Keys, Indexes, Views z internal schema z external schema

„ Store information needed by specific DBMS

modules

z query optimization module z security and authorization

QueryOpt

Relational DBMS Catalog

Relation_Keys Rel_name Key_number Member_attr

Relation_Indexes Rel_name Index_name Member_attr Index_type Attr_no Asc_Desc

View_queries View_name Query

View_attrs View_name Attr_name Attr_num

„ All meta information is stored as relations

„ Example

7 © Ling Liu

Database Access

œ User program A sends to DBMS an invoke command to retrieve a (set of) record  DBMS analyzes the external schema of the user program A and finds the database description of the record. ž DBMS checks with the schema to get the data types and location information of record Ÿ DBMS checks with the physical schema to find out which device the record is in and what access methods can be used. According to 4, DBMS sends OS a read command to execute the search.

QueryOpt

Database Access

¡ OS issues the page invoke command to the correspond device, and then puts the page fetched into the system buffer. ¢ DBMS uses the schema and the external schema to infer the logical structure of the retrieving record. £ DBMS places the relevant data to the UWA, and ¤ provides the status information at the program invocation exit.

9 © Ling Liu

Query Processing Methodology

High-level Calculus-based Query

Query Preprocessing

Query Optimization

Algebraic Query (a tree structure) LOGICAL SCHEMA

INTERNAL SCHEMA Execution Schedule (file access plan)

EXTERNAL SCHEMA

QueryOpt

Conceptual Schema

„ Describes the meaning of data in the universe of

discourse

z emphasizes on general, conceptually relevant, and often time-invariant structural aspects of the universe of discourse

„ Excludes the physical organization and access aspects

of the data

NAME ADDR SEX AGE

CUSTOMER

13 © Ling Liu

Input: Calculus query on base relations

„ Normalization

z manipulate query quantifiers and qualification

„ Analysis

z detect and reject incorrect queries z possible for only a subset of relational calculus

„ Simplification

z eliminate redundant predicates

„ Restructuring

z calculus query ⇒ algebraic query z more than one translation is possible z use transformation rules

Query Preprocessing

QueryOpt

„ Lexical and syntactic analysis (similar to

compilers)

z check validity z check for attributes and relations z type checking on the qualification

„ Put into normal form

z Conjunctive normal form ( p 11 ∨ p 12 ∨…∨ p 1 n ) ∧ … ∧ ( pm 1 ∨ pm 2 ∨…∨ p (^) mn ) z Disjunctive normal form ( p 11 ∧ p 12 ∧…∧ p 1 n ) ∨ … ∨ ( pm 1 ∧ pm 2 ∧…∧ pmn ) z OR's mapped into union AND's mapped into join or selection

Normalization

15 © Ling Liu

„ Refute incorrect queries

„ Incorrect

z disjoint components are useless z multiple relations, missing joins – may not be incorrect, but may indicate Cartesian product

„ Contradictory

z qualification can not be satisfied by any tuple DUR > 27 AND DUR < 25

Refute incorrect queries

QueryOpt

„ Why simplify?

z The simpler the query, the less work there is and the better the performance

„ How? Use transformation rules

z elimination of redundancy Æ idempotency rules p 1 ∧ ¬( p 1 ) = false ¬( p 1 ∨ p 2 ) = ¬( p 1 ) ∧ ¬( p 2 ) p 1 ∨ false = p 1 … Æ application of transitivity Æ use of integrity rules

„ Example

z x > a and x > b

Simplification

19 © Ling Liu

SELECT ENAME FROM E,W WHERE E.ENO = W.ENO AND W.RESP = "Manager“

„ Algebraic Optimization z Two steps: Æ Rewriting Æ Transformation z May produce multiple optimal plans „ Cost-based Optimization z Decide which of the alternatives is the cheapest to execute

Query Optimization: An Example

QueryOpt

Algebraic Rewriting

πENAME

σ(E.ENO = W.ENO AND W.RESP = "Manager“

ENO

E

Project

Select

Join

W

πENAME

σ(E.ENO = W.ENO AND W.RESP = "Manager“

ENO

W

Project

Select

Join

E

21 © Ling Liu

Algebraic Transformation

πENAME

σW.RESP = "Manager“

E.ENO = W.ENO

E

Moving selection downward (using Cascade of selections, Commutativity of selections, and Commuting selection with binary operations)

W

QueryOpt

How to decide which is the best plan?

πENAME

σW.RESP = "Manager“

E.ENO = W.ENO

E

W

πENAME

σW.RESP = "Manager“

E.ENO = W.ENO

E

W

Plan 1 Plan 2

25 © Ling Liu

„ Assume :

z card (E) = 4,000; card(W)=10, z 10% of tuples in W satisfy RESP="Manager" (selection generates 1,000 tuples)

„ execution time proportional to the sum of the

cardinalities of the temporary relations

„ searching is done by sequential scanning

Strategy 1 Strategy 2 Cartesian prod. = 40,000,000 Selection over W = 10, Search over all = 40,000,000 Join(4000*1000) = 4,000, 80,000,000 4,010,

Cost of Alternatives

QueryOpt

„ Assume : z card (E) = 4,000; card(W)=10, z 10% of tuples in W satisfy RESP="Manager" (selection generates 1,000 tuples) „ execution time proportional to the sum of the cardinalities of the temporary relations „ searching is done by sequential scanning

Strategy 1 Strategy 2 Strategy 3 Cartesian p=40,000,000 Selection on W=10,000 Selection on E=4, Search= 40,000,000 Join(40001000)= 4,000,000 Join(40001000) 80,000,000 =4,010,000 = 4,004,

Cost of Alternatives

27 © Ling Liu

Relational Query Optimization

„ High level (Transformation-based) Optimization (Logical Query Plans) z Algebraic Transformation or Query rewritting z Goal: Reduce the search space of logical query plans z Mechanisms: Heuristics for good transformations Æ Push Selection and Projection closer to the base relations in a SQL query tree. Æ Reduce intermediate result size as much as possible „ Low Level (cost-based) Optimization (Executable Query Plans) z Mechanism: Æ Cost estimation using intermediate result size estimation and I/O cost estimation Æ Consider various indexes and their performance characteristics when estimating costs of alternative query plans z Goal : Pick the most economical physical query plan for query execution

QueryOpt

Heuristics Query Optimization

„ Optimization Heuristics for the relational algebra

z Performing selection and projection before join z Combining several selections over a single relation into one selection z Find common subexpressions z Algebraic rewriting/transformation rules

„ General transformation rules for relational algebra

(also called equivalence-preserving algebraic

rewriting rules)

31 © Ling Liu

Algebraic

„ Commuting selection with binary operations

z σp(A)(R × S) ≡ (σp(A)(R)) × S where A is defined on R only z σp(Ai)(R (^) q(Ai,Bk) S) ≡ (σp(Ai)(R)) (^) q(Ai,Bk) S where Ai is defined on R and Bk is defined on S z σp(Ai)(R ∪ T) ≡ σp(Ai)(R) ∪ σp(Ai)(T) where Ai is defined on both R and T z σp(Ai)(R ∩ T) ≡ σp(Ai)(R) ∩ σp(Ai)(T) where Ai is defined on both R and T

„ Commuting projection with binary operations

z πC (R × S) ≡ πA’(R) × πB’ (S) z πC (R (^) q(Aj,Bk) S) ≡ πA’ (R) (^) q(Aj,Bk) πB’(S) z πC (R ∪ S) ≡ πA’ (R) ∪ πB’ (S) where R[A] and S[B]; C = A' ∪ B' where A' ⊆ A, B’ ⊆ B

QueryOpt

Heuristics-based Algebraic Optimization:

Outline

„ Using cascade of selections rule, break up any

selections with conjunctive conditions into a

cascade of selections.

z it allows more freedom in moving selections down different branches of the tree

„ Using commutativity of selections with other

operations rules, move each selection down the

query tree as far as possible.

33 © Ling Liu

„ Using associativity of binary operations, rearrange

the leaf nodes so that the most restrictive

selections are executed first.

z the fewer tuples the resulting relation contains, the more restrictive the selection is z reducing the size of intermediate results improves performance

„ If possible, combine a Cartesian product with a

selection into a join

Heuristics-based Algebraic Optimization: Outline

QueryOpt

„ Using cascade of projections and commutativity of

projections with other operations, break down and

move lists of projections down the tree as far as

possible.

„ Identify subtrees that represent groups of

operations that can be executed by a single

algorithm.

Heuristics-based Algebraic Optimization: Outline

37 © Ling Liu

E

σENAME ° "J. Doe"

JNO

ENO

Heuristic Optimization– Example

P W

πENAME

σDUR=12 OR DUR=

σJNAME=“CAD/CAM” (^) Push selection down using commutativity of selection over join

QueryOpt

P

JNO

σJNAME = (^) "CAD/CAM"

Heuristic Optimization–Example

E

σENAME ° "J. Doe"

ENO

W

πENAME

σDUR=12 OR DUR=24 Push selection down using commutativity of selection over join

39 © Ling Liu

E

πENAME

σENAME ° "J. Doe"

P W

JNO

ENO

σJNAME = "CAD/CAM" σDUR =12 ∧ DUR=

Heuristic Optimization–Example

Push selection down

QueryOpt

E

πENAME

σENAME ° "J. Doe"

P W

JNO

πJNO,ENAME

ENO

σJNAME = "CAD/CAM"

πJNO

σDUR =12 ∧ DUR=

πJNO,ENO

πJNO,ENAME

Heuristic Optimization–Example

Do early projection