Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Advanced Database Systems-Lecture 11 Slides-Computer Science, Slides of Database Management Systems (DBMS)

Duke University Database Management Systems (DBMS)

Query Processing, Notation, Table Scan, Nested-loop Join, External Merge Sort, Performance of External Merge Sort, Tricks for Sorting, Double Buffering, Blocked I/O, Internal Sort Algorithm, Quicksort, Replacement Selection, Sort-merge Join, Optimization of SMJ, Performance of two-pass SMJ, Hash join, Other Sort-based Algorithms, Partitioning Phase, Probing phase, Hash join Tricks, Hybrid Hash Join, Hash Join versus SMJ, Duality of Sort and Hash, I/O Patterns

Typology: Slides

2011/2012

Uploaded on 01/28/2012

arold 🇺🇸

4.7

(24)

372 documents

1 / 9

This page cannot be seen from the preview

Don't miss anything!

Query Processing

CPS 216

Advanced Database Systems

Announcements (February 22)

Reading assignment for this week

Variant indexes (due next Monday)

Homework #2 due in 1½ weeks (March 3)

Course project proposal due in 2 weeks

Midterm in 2½ weeks

Overview

Many different ways of processing the same query

Scan? Sort? Hash? Use an index?

All with different performance characteristics

Best choice depends on the situation

Implement all alternatives

Let the query optimizer choose at run-time

Discover Slides of Database Management Systems (DBMS) Duke University

Partial preview of the text

Download Advanced Database Systems-Lecture 11 Slides-Computer Science and more Slides Database Management Systems (DBMS) in PDF only on Docsity!

Query Processing

CPS 216

Advanced Database Systems

Announcements (February 22)

Reading assignment for this week

Variant indexes (due next Monday)

Homework #2 due in 1½ weeks (March 3)

Course project proposal due in 2 weeks

Midterm in 2½ weeks

Overview

Many different ways of processing the same query

Scan? Sort? Hash? Use an index? All with different performance characteristics

Best choice depends on the situation

Implement all alternatives Let the query optimizer choose at run-time

Notation

Relations: R , S

Tuples: r , s

Number of tuples: | R |, | S |

Number of disk blocks: B ( R ), B ( S )

Number of memory blocks available: M

Cost metric

Number of I/O’s Memory requirement

Table scan

Scan table R and process the query

Selection over R Projection of R without duplicate elimination

I/O’s: B ( R )

Trick for selection: stop early if it is a lookup by key

Memory requirement: 2 (double buffering)

Not counting the cost of writing the result out

Same for any algorithm! Maybe not needed—results may be pipelined directly into another operator

Nested-loop join

R p S For each block of R , and for each r in the block: For each block of S , and for each s in the block: Output rs if p evaluates to true over r and s R is called the outer table; S is called the inner table I/O’s: B ( R ) + | R | ⋅ B ( S ) Memory requirement: 4 (double buffering)

Improvement: block-based nested-loop join For each block of R , and for each block of S : For each r in the R block, and for each s in the S block: … I/O’s: B ( R ) + B ( R ) ⋅ B ( S ) Memory requirement: same as before

Performance of external merge sort

Number of passes: d log M – 1 d B ( R ) / M e e + 1

I/O’s

Multiply by 2 ⋅ B ( R ): each pass reads the entire relation once and writes it once Subtract B ( R ) for the final pass Roughly, this is O ( B ( R ) ⋅ log (^) M B ( R ) )

Memory requirement: M (as much as possible)

Some tricks for sorting

Double buffering

Allocate an additional block for each run

Blocked I/O

Instead of reading/writing one disk block at time, read/write a bunch (“cluster”)

Dealing with input whose size

is not an exact power of fan-in

Internal sort algorithm

Quicksort

)Fast

Replacement selection

One block for input, one for output, rest for a heap Fill the heap with input records Find the smallest record in the heap that is no less than the largest record in the current run

If that exists, move it to the output buffer, and move a new record from input buffer into the heap
If that does not exist, flush output and start a new run )Slower than quicksort, but produces longer runs (twice the size of memory if records are in random order)

Sort-merge join

R R. A = S. B S

Sort R and S by their join attributes, and then merge

r , s = the first tuples in sorted R and S

Repeat until one of R and S is exhausted:

If r. A > s. B then s = next tuple in S

else if r. A < s. B then r = next tuple in R

else output all matching tuples, and

r , s = next in R and S

I/O’s: sorting + 2 B ( R ) + 2 B ( S )

In most cases (e.g., join of key and foreign key) Worst case is B ( R ) ⋅ B ( S ): everything joins

Example

R : S : R R. A = S. B S :

r 1. A = 1 s 1. B = 1

r 2. A = 3 s 2. B = 2

r 3. A = 3 s 3. B = 3

r 4. A = 5 s 4. B = 3

r 5. A = 7 s 5. B = 8

r 6. A = 7

r 7. A = 8

r 1 s 1

r 2 s 3

r 2 s 4

r 3 s 3

r 3 s 4

r 7 s 5

Optimization of SMJ

Idea: combine join with the merge phase of merge sort

Sort: produce sorted runs of size M for R and S

Merge and join: merge the runs of R , merge the runs of S , and merge-join the result streams as they are generated!

Merge

Merge Sorted runs

Disk Memory

Join

Partitioning phase

Partition R and S according to the same hash

function on their join attributes

M – 1 partitions of R

Memory Disk

Same for S

… …

Probing phase

Read in each partition of R , stream in the

corresponding partition of S , join

Typically build a hash table for the partition of R

Not the same hash function used for partition, of course! Disk Memory

R partitions

S partitions

…

load …

stream For each S tuple, probe and join

Performance of hash join

I/O’s: 3 ⋅ ( B ( R ) + B ( S ))

Memory requirement:

In the probing phase, we should have enough memory to fit one partition of R : M – 1 ≥ B ( R ) / ( M – 1) M > sqrt( B ( R )) We can always pick R to be the smaller relation, so: M > sqrt(min( B ( R ), B ( S ))

Hash join tricks

What if a partition is too large for memory?

Read it back in and partition it further!

See the duality in multi-pass merge sort here?

Hybrid hash join

What if there is extra memory available?

Use it to avoid writing/re-reading partitions

Of both R and S! Memory Disk

R … …

A generalization of the idea is described in the survey paper by Graefe

Hash join versus SMJ

(Assuming two-pass)

I/O’s: same

Memory requirement: hash join is lower sqrt(min( B ( R ), B ( S )) < sqrt( B ( R ) + B ( S ))

Other factors Hash join performance depends on the quality of the hash

Might not get evenly sized buckets

Advanced Database Systems-Lecture 11 Slides-Computer Science, Slides of Database Management Systems (DBMS)

Related documents

Partial preview of the text

Download Advanced Database Systems-Lecture 11 Slides-Computer Science and more Slides Database Management Systems (DBMS) in PDF only on Docsity!

Query Processing

CPS 216

Advanced Database Systems

Announcements (February 22)

 Reading assignment for this week

 Homework #2 due in 1½ weeks (March 3)

 Course project proposal due in 2 weeks

 Midterm in 2½ weeks

Overview

 Many different ways of processing the same query

 Best choice depends on the situation

Notation

 Relations: R , S

 Tuples: r , s

 Number of tuples: | R |, | S |

 Number of disk blocks: B ( R ), B ( S )

 Number of memory blocks available: M

 Cost metric

Table scan

 Scan table R and process the query

 I/O’s: B ( R )

 Memory requirement: 2 (double buffering)

 Not counting the cost of writing the result out

Nested-loop join

Performance of external merge sort

 Number of passes: d log M – 1 d B ( R ) / M e e + 1

 I/O’s

 Memory requirement: M (as much as possible)

Some tricks for sorting

 Double buffering

 Blocked I/O

 Dealing with input whose size

is not an exact power of fan-in

Internal sort algorithm

 Quicksort

 Replacement selection

Sort-merge join

 R R. A = S. B S

 Sort R and S by their join attributes, and then merge

r , s = the first tuples in sorted R and S

Repeat until one of R and S is exhausted:

If r. A > s. B then s = next tuple in S

else if r. A < s. B then r = next tuple in R

else output all matching tuples, and

r , s = next in R and S

 I/O’s: sorting + 2 B ( R ) + 2 B ( S )

Example

R : S : R R. A = S. B S :

r 1. A = 1 s 1. B = 1

r 2. A = 3 s 2. B = 2

r 3. A = 3 s 3. B = 3

r 4. A = 5 s 4. B = 3

r 5. A = 7 s 5. B = 8

r 6. A = 7

r 7. A = 8

r 1 s 1

r 2 s 3

r 2 s 4

r 3 s 3

r 3 s 4

r 7 s 5

Optimization of SMJ

Partitioning phase

 Partition R and S according to the same hash

function on their join attributes

Probing phase

 Read in each partition of R , stream in the

corresponding partition of S , join

Performance of hash join

 I/O’s: 3 ⋅ ( B ( R ) + B ( S ))

 Memory requirement:

Hash join tricks

 What if a partition is too large for memory?

Hybrid hash join

 What if there is extra memory available?

Hash join versus SMJ

Reading assignment for this week

Homework #2 due in 1½ weeks (March 3)

Course project proposal due in 2 weeks

Midterm in 2½ weeks

Many different ways of processing the same query

Best choice depends on the situation

Relations: R , S

Tuples: r , s

Number of tuples: | R |, | S |

Number of disk blocks: B ( R ), B ( S )

Number of memory blocks available: M

Cost metric

Scan table R and process the query

I/O’s: B ( R )

Memory requirement: 2 (double buffering)

Not counting the cost of writing the result out

Number of passes: d log M – 1 d B ( R ) / M e e + 1

I/O’s

Memory requirement: M (as much as possible)

Double buffering

Blocked I/O

Dealing with input whose size

Quicksort

Replacement selection

R R. A = S. B S

Sort R and S by their join attributes, and then merge

I/O’s: sorting + 2 B ( R ) + 2 B ( S )

Partition R and S according to the same hash

Read in each partition of R , stream in the

I/O’s: 3 ⋅ ( B ( R ) + B ( S ))

Memory requirement:

What if a partition is too large for memory?

What if there is extra memory available?