SQL Server Query Optimizer-Database System Implementaion-Lecture Slides, Slides of Database Management Systems (DBMS)

This lecture was delivered by Chandrakiran Bonjani at Allahabad University. Its related to Database System Implementation and its main points are: SQL, Server, Optimizer, Critique, Empirical, Statistics, Cascades, Framework, Extensible, Logical, Algorithms

Typology: Slides

2011/2012

Uploaded on 07/15/2012

saighiridhar
saighiridhar 🇮🇳

4.5

(2)

37 documents

1 / 44

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Ravi Ramamurthy
DMX Group
Microsoft Research
1 5/25/2011 Stanford Talk
docsity.com
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c

Partial preview of the text

Download SQL Server Query Optimizer-Database System Implementaion-Lecture Slides and more Slides Database Management Systems (DBMS) in PDF only on Docsity!

Ravi Ramamurthy

DMX Group

Microsoft Research

5/25/2011 Stanford Talk 1

 SQL Server optimizer

 Enumeration architecture

 Search space: flexibility/extensibility

 Cost and statistics

 Critique of modern optimizers

 Empirical analysis of optimizers

 Conclusion

5/25/2011 Stanford Talk

SELECT l_orderkey, l_linenumber, o_orderstatus FROM lineitem JOIN orders ON l_orderkey = o_orderkey

WHERE l_suppkey < 2000 AND l_partkey < 2000

lineitem

l_suppkey<2000, l_partkey<

l_orderkey=o_orderkey

orders

l_orderkey, l_linenumber, o_orderstatus

Sample query: Obtain information about certain ordered

lineitems that are filtered by suppliers and parts.

5/25/2011 Stanford Talk

 Based on Cascades Framework

 Transformation-based, top-down approach

 Optimization = Tasks + Memo

( Programs = Algorithms + Data Structures )

 Fully cost-based

 Flexible and Extensible

 Search space easy to change

 New operators and rules easy to add

SQL

Query

Parsing

Validation

Normalization

(predicate unfolding,

join collapsing, view

substitution, etc.)

Cost-based

Optimization

Execution

Plan

5/25/2011 Stanford Talk

 Search Space Memory

 Compactly stores all explored alternatives (AND-OR graph)

 Groups together equivalent operator trees and their plans

 Provides memoization, duplicate detection, property and cost

management, etc.

  1. SELECT (b>20, )

b>

  1. SELECT (a<10, ) S

a< R

  1. JOIN (x=y, , )

a<10 b>201) SELECT (a<10, ) R S

R 1) GET(R) S 1) GET(S)

R b> S

  1. JOIN (x=y, , )

  2. ...

  3. ...

Groups

Expressions

5/25/2011 Stanford Talk

 Logical Properties

 Valid for whole group  E.g.: cardinality, output columns, column equivalences, distinctness

 Flow bottom-up

Root Group 12: 0) Join 7 8 11 Group 11: 0) Comp (=) 9 10 Group 10: 0) Identifier (l_orderkey) Group 9: 0) Identifier (o_orderkey) Group 8: 0) Get (orders) Group 7: 0) Select 5 6 Group 6: 0) Logical (and) 2 4 Group 5: 0) Get (lineitem) Group 4: 0) Comp (<) 3 1 Group 3: 0) Identifier (l_partkey) Group 2: 0) Comp (<) 0 1 Group 1: 0) Const (2000) Group 0: 0) Identifier (l_suppkey)

Root Group 12: 5 Apply(lineitem_0) 8 47 4 HashJoin 7.5 8.3 11.0 [Cost=70.75] 3 MergeJoin 8.1 7.3 11.0 [Cost=68.12] 2 MergeJoin 7.3 8.1 11.0 [Cost=72.14] 1 Join 8 7 11 0 Join 7 8 11

Group 7: 7 Filter 5.3 6 6 Filter 44 2 5 Apply (lineitem_0) 32.1 41.1 [Cost=45.43] 4 Apply (lineitem_0) 32.3 41.2 [Cost=50.33] 3 Sort (l_orderkey) 7.5 [Cost=46.22] 2 Filter 5.1 6.0 [Cost=94.71] 1 Filter 15.1 6.0 [Cost=94.71] 0 Select 5 6 ...

 Physical Properties  Valid for specific expression  E.g.: sort columns, halloween protection, cursors  Flow in both directions (derived vs. required) 5/25/2011 Stanford Talk

Root Group 12: 0) Join 7 8 11 Group 11: 0) Comp (=) 9 10 Group 10: 0) Identifier (l_orderkey) Group 9: 0) Identifier (o_orderkey) Group 8: 0) Get (orders) Group 7: 0) Select 5 6 Group 6: 0) Logical (and) 2 4 Group 5: 0) Get (lineitem) Group 4: 0) Comp (<) 3 1 Group 3: 0) Identifier (l_partkey) Group 2: 0) Comp (<) 0 1 Group 1: 0) Const (2000) Group 0: 0) Identifier (l_suppkey)

OptimizeGroup 12 : ExploreGroup 12, OptimizeGroup 12

ExploreGroup 12 : ExploreExpr 12.

ExploreExpr 12.0 : ExploreGroup 7-8, applyRule [Comm, Assoc, ViewMatch, …]

ExploreGroup 7 : ExploreExpr 7.

ExploreExpr 7.0: ExploreGroup 5, applyRule [Sel2->Sel, Sel->LASJ,…]

ExploreGroup 5: ExploreExpr 5.

ExploreExpr 5.0: applyRule [ViewMatch, …]

ExploreGroup 8: ExploreExpr 8.

ExploreExpr 8.0: applyRule [ViewMatch,…]

ApplyRule [Join-Commutativity]

Rule Result -> Group 12.1 and (ExploreExpr 12.1)

Stanford Talk docsity.com

Root Group 12: 0) Join 7 8 11 1) Join 8 7 11 Group 11: 0) Comp (=) 9 10 Group 10: 0) Identifier (l_orderkey) Group 9: 0) Identifier (o_orderkey) Group 8: 0) Get (orders) Group 7: 0) Select 5 6 Group 6: 0) Logical (and) 2 4 Group 5: 0) Get (lineitem) Group 4: 0) Comp (<) 3 1 Group 3: 0) Identifier (l_partkey) Group 2: 0) Comp (<) 0 1 Group 1: 0) Const (2000) Group 0: 0) Identifier (l_suppkey)

OptimizeGroup 12: applyRule[JN->INL, JN->SM,…]

ApplyRule (JN->INL)

Rule Result -> 12.2 and (optInputs 12.2)

INL (<7>, SelectIdx(

GetIdx orders_CL,

o_orderkey=l_orderkey)

5/25/2011 (^) Stanford Talk

**Group 15: 0) GetIdx lineitem

  1. Range lineitem_1 (cost=89.43)** Group 14: 0) SelectIdx 13 11 1) Range orders_1 11.0 (cost=30.98) Group 13: 0) GetIdx orders_ Root Group 12: 0) Join 7 8 11
  2. Join 8 7 11
  3. INLJoin 7 14 Group 11: 0) Comp (=) 9 10 Group 10: 0) Identifier (l_orderkey) Group 9: 0) Identifier (o_orderkey) Group 8: 0) Get (orders) Group 7: 0) Select 5 6 1) Filter 15 .1 6 .0 (cost=94.71) Group 6: 0) Logical (and) 2 4 Group 5: 0) Get (lineitem) ...

ApplyRule Jn->SM (fourth implementation rule to root group 12)

Rule Result 12.3 and (OptInputs 12.3)

OptimizeInputs 12.

OptGroup 7 (with sort required for column o_orderkey)

ApplyRule (enforceSort) result 7.

Opt_Inputs 7.

OptGroup 7 (without sort requirement)

Stanford Talk docsity.com

Group 13: 0) GetIdx orders_ Root Group 12: 0) Join 7 8 11

  1. Join 8 7 11
  2. INLJoin 7 14 3) MergeJoin 7.3 8.1 11.0 (cost=74.33) Group 11: 0) Comp (=) 9 10 Group 10: 0) Identifier (l_orderkey) Group 9: 0) Identifier (o_orderkey) Group 8: 0) Get (orders) 1) Range orders_1 (cost=18.425) Group 7: 0) Select 5 6
  3. Filter 15 .1 6 .0 (cost=94.71) **2) Sort 7.5 (cost=46.22)
  4. Apply 32.1 41.1 (cost=45.25)** Group 6: 0) Logical (and) 2 4 Group 5: 0) Get (lineitem) ...

ApplyRule Jn->SM (fourth implementation rule to root group 12)

Rule Result 12.3 and (OptInputs 12.3)

OptimizeInputs 12.

OptGroup 7 (with sort required for column o_orderkey)

ApplyRule (enforceSort) result 7.

Opt_Inputs 7.

OptGroup 7 (without sort requirement)

5/25/2011 Stanford Talk

docsity.com

SELECT l_tax, o_totalprice

FROM lineitem JOIN orders ON l_orderkey = o_orderkey WHERE l_suppkey < 2000 AND l_partkey < 2000

SELECT l_tax, o_totalprice

FROM lineitem JOIN orders ON l_orderkey = o_orderkey WHERE l_suppkey < 20000 AND l_partkey < 20000

Stanford Talk

docsity.com

 CREATE STATISTICS on Column where

 Filtered statistics provide opportunity to reduce two

sources of errors in cardinality estimation

 Independence assumption

 E.g., Salary | (Age <30) vs. Salary | (Age > 50)

 Interpolation within a histogram bucket

 Can be important for very large tables

 Workloads are complex

 DBAs need help in deciding which filtered statistics are

important

5/25/2011 Microsoft Confidential 17

select *

from customer

where 100,000 <

(select sum(o_totalprice)

from orders

where o_custkey = c_custkey )

 Subqueries are expensive to execute

 Represented as relational operator trees

 Not SQL-Block- focused

 Apply Operator Abstraction

 R Apply E( r )

 For each row r of R, execute function E on r

 Return union: {r 1 } X E(r 1 ) U {r 2 } X E(r 2 ) U …

 Abstracts “for each” and relational function invocation

 Subquery removal : Transform tree to remove

relational operators from under scalar operators

 The crux of efficient processing

 “Unnesting”, “Decorrelation”

 Get joins, outerjoins, semijoins … as a result