Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

CS 245: Database Systems - Query Optimization Notes, Slides of Principles of Database Management

Alliance University Principles of Database Management

Notes on query optimization in database systems, covering topics such as plan generation and comparison, estimating ios, join algorithms, and index usage. It also discusses factors affecting performance and various optimization techniques.

Typology: Slides

2011/2012

Uploaded on 07/15/2012

saibal 🇮🇳

32 documents

1 / 13

This page cannot be seen from the preview

Don't miss anything!

CS 245 Notes 7 1

CS 245: Database System

Principles

Notes 7: Query Optimization

Hector Garcia-Molina

CS 245 Notes 7 2

--> Generating and comparing plans

Query

Generate Plans

Pruning x x

Estimate Cost

Cost

Select

Query Optimization

Pick Min

CS 245 Notes 7 3

To generate plans consider:

• Transforming relational algebra expression

(e.g. order of joins)

• Use of existing indexes

• Building indexes or sorting on the fly

CS 245 Notes 7 4

• Implementation details:

e.g. - Join algorithm

- Memory management

- Parallel processing

CS 245 Notes 7 5

Estimating IOs:

• Count # of disk blocks that must be

read (or written) to execute query plan

CS 245 Notes 7 6

To estimate costs, we may have

additional parameters:

B(R) = # of blocks containing R tuples

f(R) = max # of tuples of R per block

M = # memory blocks available

docsity.com

Discover Slides of Principles of Database Management Alliance University

Partial preview of the text

Download CS 245: Database Systems - Query Optimization Notes and more Slides Principles of Database Management in PDF only on Docsity!

CS 245 Notes 7 1

CS 245: Database System

Principles

Notes 7: Query Optimization

Hector Garcia-Molina

CS 245 Notes 7 2

--> Generating and comparing plans Query Generate Plans Pruning x x Estimate Cost Cost Select

Query Optimization

Pick Min

CS 245 Notes 7 3

To generate plans consider:

Transforming relational algebra expression (e.g. order of joins)
Use of existing indexes
Building indexes or sorting on the fly

CS 245 Notes 7 4

Implementation details: e.g. - Join algorithm - Memory management - Parallel processing

CS 245 Notes 7 5

Estimating IOs:

Count # of disk blocks that must be read (or written) to execute query plan

CS 245 Notes 7 6

To estimate costs, we may have

additional parameters:

B(R) = # of blocks containing R tuples f(R) = max # of tuples of R per block M = # memory blocks available

CS 245 Notes 7 7

To estimate costs, we may have

additional parameters:

B(R) = # of blocks containing R tuples f(R) = max # of tuples of R per block M = # memory blocks available

HT(i) = # levels in index i LB(i) = # of leaf blocks in index i

CS 245 Notes 7 8

Clustering index

Index that allows tuples to be read in an order that corresponds to physical order A

indexA

10 15 17 19 35 37

CS 245 Notes 7 9

Notions of clustering

Clustered file organization …..
Clustered relation …..
Clustering index

R1 R2 S1 S2 R3 R4 S3 S

R1 R2 R3 R4 R5 R5 R7 R

CS 245 Notes 7 10

Example R1 R2 over common attribute C

T(R1) = 10,

T(R2) = 5,

S(R1) = S(R2) = 1/10 block Memory available = 101 blocks

CS 245 Notes 7 11

Example R1 R2 over common attribute C

T(R1) = 10,

T(R2) = 5,

S(R1) = S(R2) = 1/10 block Memory available = 101 blocks

 Metric: # of IOs (ignoring writing of result) CS 245 Notes 7 12

Caution!

This may not be the best way to compare

ignoring CPU costs
ignoring timing
ignoring double buffering requirements

CS 245 Notes 7 19

Hash join (conceptual)
- Hash function h, range 0  k
- Buckets for R1: G0, G1, ... Gk
- Buckets for R2: H0, H1, ... Hk

CS 245 Notes 7 20

Hash join (conceptual)
- Hash function h, range 0  k
- Buckets for R1: G0, G1, ... Gk
- Buckets for R2: H0, H1, ... Hk

Algorithm (1) Hash R1 tuples into G buckets (2) Hash R2 tuples into H buckets (3) For i = 0 to k do match tuples in Gi, Hi buckets

CS 245 Notes 7 21

Simple example hash: even/odd

R1 R2 Buckets 2 5 Even 4 4 R1 R 3 12 Odd: 5 3 8 13 9 8 11 14

2 4 8 4 12 8 14

3 5 9 5 3 13 11

CS 245 Notes 7 22

Factors that affect performance

(1) Tuples of relation stored physically together?

(2) Relations sorted by join attribute?

(3) Indexes exist?

CS 245 Notes 7 23

Example 1(a) Iteration Join R1 R

Relations not contiguous
Recall T(R1) = 10,000 T(R2) = 5, S(R1) = S(R2) =1/10 block MEM=101 blocks

CS 245 Notes 7 24

Example 1(a) Iteration Join R1 R

Relations not contiguous
Recall T(R1) = 10,000 T(R2) = 5, S(R1) = S(R2) =1/10 block MEM=101 blocks Cost: for each R1 tuple: [Read tuple + Read R2] Total =10,000 [1+5000]=50,010,000 IOs

CS 245 Notes 7 25

• Can we do better?

CS 245 Notes 7 26

• Can we do better?

Use our memory (1) Read 100 blocks of R (2) Read all of R2 (using 1 block) + join (3) Repeat until done

CS 245 Notes 7 27

Cost: for each R1 chunk: Read chunk: 1000 IOs Read R2: 5000 IOs 6000

CS 245 Notes 7 28

Cost: for each R1 chunk: Read chunk: 1000 IOs Read R2: 5000 IOs 6000

Total = 10,000 x 6000 = 60,000 IOs 1,

CS 245 Notes 7 29

• Can we do better?

CS 245 Notes 7 30

• Can we do better?

 Reverse join order: R2 R

Total = 5000 x (1000 + 10,000) = 1000 5 x 11,000 = 55,000 IOs

CS 245 Notes 7 37

(ii) Read all chunks + merge + write out

Sorted file Memory Sorted Chunks ... ...

CS 245 Notes 7 38

Cost: Sort Each tuple is read,written, read, written so... Sort cost R1: 4 x 1,000 = 4, Sort cost R2: 4 x 500 = 2,

CS 245 Notes 7 39

Example 1(d) Merge Join (continued)

R1,R2 contiguous, but unordered

Total cost = sort cost + join cost = 6,000 + 1,500 = 7,500 IOs

CS 245 Notes 7 40

Example 1(d) Merge Join (continued)

R1,R2 contiguous, but unordered

Total cost = sort cost + join cost = 6,000 + 1,500 = 7,500 IOs

But: Iteration cost = 5, so merge joint does not pay off!

CS 245 Notes 7 41

But say R1 = 10,000 blocks contiguous R2 = 5,000 blocks not ordered

Iterate: 5000 x (100+10,000) = 50 x 10, 100 = 505,000 IOs

Merge join: 5(10,000+5,000) = 75,000 IOs

Merge Join (with sort) WINS! CS 245 Notes 7 42

How much memory do we need for

merge sort?

E.g: Say I have 10 memory blocks 10

...

100 chunks  to merge, need R1 100 blocks!

CS 245 Notes 7 43

In general:

Say k blocks in memory x blocks for relation sort

chunks = (x/k) size of chunk = k

CS 245 Notes 7 44

In general:

Say k blocks in memory x blocks for relation sort

chunks = (x/k) size of chunk = k

chunks < buffers available for merge

CS 245 Notes 7 45

In general:

Say k blocks in memory x blocks for relation sort

chunks = (x/k) size of chunk = k

chunks < buffers available for merge

so... (x/k)  k or k^2  x or k  x

CS 245 Notes 7 46

In our example

R1 is 1000 blocks, k  31. R2 is 500 blocks, k  22.

Need at least 32 buffers

CS 245 Notes 7 47

Can we improve on merge join?

Hint: do we really need the fully sorted files? R

Join?

sorted runs

CS 245 Notes 7 48

Cost of improved merge join:

C = Read R1 + write R1 into runs

read R2 + write R2 into runs
join = 2000 + 1000 + 1500 = 4500

--> Memory requirement?

CS 245 Notes 7 55

Total cost (including probes)

= 500+5000 [Probe + get records] = 500+5000 [0.5+2] uniform assumption = 500+12,500 = 13,000 (case b)

CS 245 Notes 7 56

Total cost (including probes)

= 500+5000 [Probe + get records] = 500+5000 [0.5+2] uniform assumption = 500+12,500 = 13,000 (case b)

For case (c): = 500+5000[0.5  1 + (1/100)  1] = 500+2500+50 = 3050 IOs

CS 245 Notes 7 57

So far

Iterate R2 R1 55,000 (best) Merge Join _______ Sort+ Merge Join _______ R1.C Index _______ R2.C Index _______ Iterate R2 R1 5500 Merge join 1500 Sort+Merge Join 7500  4500 R1.C Index 5500  3050  550 contiguousR2.C Index ________

not contiguous

CS 245 Notes 7 58

R1, R2 contiguous (un-ordered)  Use 100 buckets  Read R1, hash, + write buckets

R1 

Example 1(f) Hash Join

... ... 10 blocks

100

CS 245 Notes 7 59

-> Same for R -> Read one R1 bucket; build memory hash table -> Read corresponding R2 bucket + hash probe

R ... ... memory

 Then repeat for all buckets CS 245 Notes 7 60

Cost:

“Bucketize:” Read R1 + write Read R2 + write Join: Read R1, R

Total cost = 3 x [1000+500] = 4500

CS 245 Notes 7 61

Cost:

“Bucketize:” Read R1 + write Read R2 + write Join: Read R1, R

Total cost = 3 x [1000+500] = 4500 Note: this is an approximation since buckets will vary in size and we have to round up to blocks CS 245 Notes 7 62

Minimum memory requirements:

Size of R1 bucket = (x/k) k = number of memory buffers x = number of R1 blocks So... (x/k) < k

k > x need: k+1 total memory buffers

CS 245 Notes 7 63

Trick: keep some buckets in memory

E.g., k’=33 R1 buckets = 31 blocks keep 2 in memory memory G 0 G 1

...

33-2=

called hybrid hash-join CS 245 Notes 7 64

Trick: keep some buckets in memory

E.g., k’=33 R1 buckets = 31 blocks keep 2 in memory memory G 0 G 1

...

33-2=

Memory use: G0 31 buffers G1 31 buffers Output 33-2 buffers R1 input 1 Total 94 buffers 6 buffers to spare!!

called hybrid hash-join

CS 245 Notes 7 65

Next: Bucketize R

R2 buckets =500/33= 16 blocks
Two of the R2 buckets joined immediately with G0,G memory G 0 G 1

...

33-2=

...

33-2=

R2 buckets (^) R1 buckets

CS 245 Notes 7 66

Finally: Join remaining buckets

for each bucket pair:
- read one of the buckets into memory
- join with second bucket memory

out

...

33-2=

ans

...

33-2=

one full R2bucket R2 buckets (^) R1 buckets

one R1buffer

CS 245 Notes 7 73

So far:

Iterate 5500 Merge join 1500 Sort+merge joint 7500 R1.C index 5500  550 R2.C index _____ Build R.C index _____ Build S.C index _____ Hash join 4500+ with trick,R1 first 4414 with trick,R2 first _____ Hash join, pointers 1600

contiguous

CS 245 Notes 7 74

Summary

Iteration ok for “small” relations (relative to memory size)
For equi-join, where relations not sorted and no indexes exist, hash join usually best

CS 245 Notes 7 75

Sort + merge join good for non-equi-join (e.g., R1.C > R2.C)
If relations already sorted, use merge join
If index exists, it could be useful (depends on expected result size)

CS 245 Notes 7 76

Join strategies for parallel processors

Later on….

CS 245 Notes 7 77

Chapter 16 [16] summary

Relational algebra level
Detailed query plan level
- Estimate costs
- Generate plans
  - Join algorithms
- Compare costs

CS 245: Database Systems - Query Optimization Notes, Slides of Principles of Database Management

Related documents

Partial preview of the text

Download CS 245: Database Systems - Query Optimization Notes and more Slides Principles of Database Management in PDF only on Docsity!

CS 245: Database System

Principles

Notes 7: Query Optimization

Query Optimization

To generate plans consider:

Estimating IOs:

To estimate costs, we may have

additional parameters:

To estimate costs, we may have

additional parameters:

Clustering index

Notions of clustering

Example R1 R2 over common attribute C

T(R1) = 10,

T(R2) = 5,

Example R1 R2 over common attribute C

T(R1) = 10,

T(R2) = 5,

Caution!

Simple example hash: even/odd

Factors that affect performance

Example 1(a) Iteration Join R1 R

Example 1(a) Iteration Join R1 R

• Can we do better?

• Can we do better?

• Can we do better?

• Can we do better?

 Reverse join order: R2 R

Example 1(d) Merge Join (continued)

Example 1(d) Merge Join (continued)

How much memory do we need for

merge sort?

In general:

chunks = (x/k) size of chunk = k

In general:

chunks = (x/k) size of chunk = k

chunks < buffers available for merge

In general:

chunks = (x/k) size of chunk = k

chunks < buffers available for merge

In our example

Can we improve on merge join?

Cost of improved merge join:

So far

R1 

Example 1(f) Hash Join

Cost:

Cost:

Minimum memory requirements:

Trick: keep some buckets in memory

Trick: keep some buckets in memory

So far:

Summary

Join strategies for parallel processors

Chapter 16 [16] summary