














Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
An overview of database query optimization, focusing on techniques and rules such as algebraic rewriting, transformation, heuristics-based optimization, and cost-based optimization. Concepts like cascade of selections, commutativity of binary operators, and commuting selection with binary operations.
Typology: Study notes
1 / 22
This page cannot be seen from the preview
Don't miss anything!















1 © Ling Liu
QueryOpt
application interfaces
application programs
database query schema
DML preprocessor Query Processor
DDL Preprocessor
Database Manager object code of APs
DBMS
File Manager
System Catalog
Data Files disk storage
3 © Ling Liu
z conceptual database schema (logical data model) Æ Relations, Attributes, Keys, Indexes, Views z internal schema z external schema
z query optimization module z security and authorization
QueryOpt
Relation_Keys Rel_name Key_number Member_attr
Relation_Indexes Rel_name Index_name Member_attr Index_type Attr_no Asc_Desc
View_queries View_name Query
View_attrs View_name Attr_name Attr_num
7 © Ling Liu
User program A sends to DBMS an invoke command to retrieve a (set of) record DBMS analyzes the external schema of the user program A and finds the database description of the record. DBMS checks with the schema to get the data types and location information of record DBMS checks with the physical schema to find out which device the record is in and what access methods can be used. According to 4, DBMS sends OS a read command to execute the search.
QueryOpt
¡ OS issues the page invoke command to the correspond device, and then puts the page fetched into the system buffer. ¢ DBMS uses the schema and the external schema to infer the logical structure of the retrieving record. £ DBMS places the relevant data to the UWA, and ¤ provides the status information at the program invocation exit.
9 © Ling Liu
High-level Calculus-based Query
Query Preprocessing
Query Optimization
Algebraic Query (a tree structure) LOGICAL SCHEMA
INTERNAL SCHEMA Execution Schedule (file access plan)
EXTERNAL SCHEMA
QueryOpt
z emphasizes on general, conceptually relevant, and often time-invariant structural aspects of the universe of discourse
NAME ADDR SEX AGE
CUSTOMER
13 © Ling Liu
z manipulate query quantifiers and qualification
z detect and reject incorrect queries z possible for only a subset of relational calculus
z eliminate redundant predicates
z calculus query ⇒ algebraic query z more than one translation is possible z use transformation rules
QueryOpt
z check validity z check for attributes and relations z type checking on the qualification
z Conjunctive normal form ( p 11 ∨ p 12 ∨…∨ p 1 n ) ∧ … ∧ ( pm 1 ∨ pm 2 ∨…∨ p (^) mn ) z Disjunctive normal form ( p 11 ∧ p 12 ∧…∧ p 1 n ) ∨ … ∨ ( pm 1 ∧ pm 2 ∧…∧ pmn ) z OR's mapped into union AND's mapped into join or selection
15 © Ling Liu
z disjoint components are useless z multiple relations, missing joins – may not be incorrect, but may indicate Cartesian product
z qualification can not be satisfied by any tuple DUR > 27 AND DUR < 25
QueryOpt
z The simpler the query, the less work there is and the better the performance
z elimination of redundancy Æ idempotency rules p 1 ∧ ¬( p 1 ) = false ¬( p 1 ∨ p 2 ) = ¬( p 1 ) ∧ ¬( p 2 ) p 1 ∨ false = p 1 … Æ application of transitivity Æ use of integrity rules
z x > a and x > b
19 © Ling Liu
SELECT ENAME FROM E,W WHERE E.ENO = W.ENO AND W.RESP = "Manager“
Algebraic Optimization z Two steps: Æ Rewriting Æ Transformation z May produce multiple optimal plans Cost-based Optimization z Decide which of the alternatives is the cheapest to execute
QueryOpt
πENAME
σ(E.ENO = W.ENO AND W.RESP = "Manager“
ENO
E
Project
Select
Join
W
πENAME
σ(E.ENO = W.ENO AND W.RESP = "Manager“
ENO
W
Project
Select
Join
E
21 © Ling Liu
πENAME
σW.RESP = "Manager“
E.ENO = W.ENO
E
Moving selection downward (using Cascade of selections, Commutativity of selections, and Commuting selection with binary operations)
W
QueryOpt
πENAME
σW.RESP = "Manager“
E.ENO = W.ENO
E
W
πENAME
σW.RESP = "Manager“
E.ENO = W.ENO
E
W
Plan 1 Plan 2
25 © Ling Liu
z card (E) = 4,000; card(W)=10, z 10% of tuples in W satisfy RESP="Manager" (selection generates 1,000 tuples)
Strategy 1 Strategy 2 Cartesian prod. = 40,000,000 Selection over W = 10, Search over all = 40,000,000 Join(4000*1000) = 4,000, 80,000,000 4,010,
QueryOpt
Assume : z card (E) = 4,000; card(W)=10, z 10% of tuples in W satisfy RESP="Manager" (selection generates 1,000 tuples) execution time proportional to the sum of the cardinalities of the temporary relations searching is done by sequential scanning
Strategy 1 Strategy 2 Strategy 3 Cartesian p=40,000,000 Selection on W=10,000 Selection on E=4, Search= 40,000,000 Join(40001000)= 4,000,000 Join(40001000) 80,000,000 =4,010,000 = 4,004,
27 © Ling Liu
High level (Transformation-based) Optimization (Logical Query Plans) z Algebraic Transformation or Query rewritting z Goal: Reduce the search space of logical query plans z Mechanisms: Heuristics for good transformations Æ Push Selection and Projection closer to the base relations in a SQL query tree. Æ Reduce intermediate result size as much as possible Low Level (cost-based) Optimization (Executable Query Plans) z Mechanism: Æ Cost estimation using intermediate result size estimation and I/O cost estimation Æ Consider various indexes and their performance characteristics when estimating costs of alternative query plans z Goal : Pick the most economical physical query plan for query execution
QueryOpt
z Performing selection and projection before join z Combining several selections over a single relation into one selection z Find common subexpressions z Algebraic rewriting/transformation rules
31 © Ling Liu
z σp(A)(R × S) ≡ (σp(A)(R)) × S where A is defined on R only z σp(Ai)(R (^) q(Ai,Bk) S) ≡ (σp(Ai)(R)) (^) q(Ai,Bk) S where Ai is defined on R and Bk is defined on S z σp(Ai)(R ∪ T) ≡ σp(Ai)(R) ∪ σp(Ai)(T) where Ai is defined on both R and T z σp(Ai)(R ∩ T) ≡ σp(Ai)(R) ∩ σp(Ai)(T) where Ai is defined on both R and T
z πC (R × S) ≡ πA’(R) × πB’ (S) z πC (R (^) q(Aj,Bk) S) ≡ πA’ (R) (^) q(Aj,Bk) πB’(S) z πC (R ∪ S) ≡ πA’ (R) ∪ πB’ (S) where R[A] and S[B]; C = A' ∪ B' where A' ⊆ A, B’ ⊆ B
QueryOpt
z it allows more freedom in moving selections down different branches of the tree
33 © Ling Liu
z the fewer tuples the resulting relation contains, the more restrictive the selection is z reducing the size of intermediate results improves performance
Heuristics-based Algebraic Optimization: Outline
QueryOpt
Heuristics-based Algebraic Optimization: Outline
37 © Ling Liu
E
σENAME ° "J. Doe"
JNO
ENO
Heuristic Optimization– Example
P W
πENAME
σDUR=12 OR DUR=
σJNAME=“CAD/CAM” (^) Push selection down using commutativity of selection over join
QueryOpt
P
JNO
σJNAME = (^) "CAD/CAM"
Heuristic Optimization–Example
E
σENAME ° "J. Doe"
ENO
W
πENAME
σDUR=12 OR DUR=24 Push selection down using commutativity of selection over join
39 © Ling Liu
E
πENAME
σENAME ° "J. Doe"
P W
JNO
ENO
σJNAME = "CAD/CAM" σDUR =12 ∧ DUR=
Heuristic Optimization–Example
Push selection down
QueryOpt
E
πENAME
σENAME ° "J. Doe"
P W
JNO
πJNO,ENAME
ENO
σJNAME = "CAD/CAM"
πJNO
σDUR =12 ∧ DUR=
πJNO,ENO
πJNO,ENAME
Heuristic Optimization–Example
Do early projection