Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Homework Solutions - Database Systems | CS 411, Assignments of Deductive Database Systems

University of Illinois - Urbana-Champaign Deductive Database Systems

Material Type: Assignment; Class: Database Systems; Subject: Computer Science; University: University of Illinois - Urbana-Champaign; Term: Fall 2004;

Typology: Assignments

Pre 2010

Uploaded on 03/10/2009

koofers-user-ctn-1 🇺🇸

9 documents

1 / 6

This page cannot be seen from the preview

Don't miss anything!

CS411 Database Systems

Fall 2004

HW#4 Solution

Problem 1 Index-Based Join and Sort-Merge Join Algorithms

(a) 4, 7, 5, 2, 2, 8

(b) 2, 2, 4, 5, 7, 8 or 8, 7, 5, 4, 2, 2

Problem 2 Hybrid Hash Join Algorithm

To simplify the problem, we make the following assumptions:

1. Two-pass hash-join and Hybrid hash-join:

1.1 Among the two requirements, ”use as few buckets as possible” and ”read and write as many

blocks as we can to consecutive positions on disk”, satisfy ”use as few buckets as possible” first.

1.2 In the first pass there is only one buffer block for each bucket (therefore the blocks are written

to disks individually), while multiple blocks can be assigned for input buffer (therefore the blocks

can be read from tables consecutively). Note that in Hybrid Hash-join, you always keep one whole

bucket in memory and one buffer block for each of the other buckets.

2. Sort-based join

2.1 Use sort-merge-join in Section 15.4.7, i.e., the second phase of the sorting is combined with the

join itself.

2.2 We only write sorted sublists consecutively. We do not read sorted sublists consecutively.

3. All

No need to consider clever scheme such as dynamic expanding/shrinking of input buffer and hash

bucket buffer.

You can make your own assumptions as long as your assumptions are consistent with the conditions

in problem (e.g., you can assume multiple blocks are used for each bucket in the first-phase, you

can use the sort-based join algorithm in Section 15.4.5, you can read sorted sublists consecutively

as well), and your answer is conistent with your assumptions.

(a) In order to use as few buckets as possible and still be able to fit one whole bucket of the smaller

relation (S) into memory, the number of buckets should be 5. Therefore each buckts contain 200

blocks (R) and 100 blocks (S).

Therefore in the first pass, we can read 96 consecutive blocks from disk into memory.

Reading S while hashing into buckets: (100 + 96/2) ∗5 + 100 + 20/2 = 850. (Explanation: It can

read 96 consecutive blocks from S for 5 times, then finally 20 consecutive blocks from S to finish

it.)

Reading R while hashing into buckets: (100 + 96/2) ∗10 + 100 + 40/2 = 1600. (Explanation: It

can reads 96 consecutive blocks from R for 10 times, then finally 40 consecutive blocks from R to

1

Discover Assignments of Deductive Database Systems University of Illinois - Urbana-Champaign

Partial preview of the text

Download Homework Solutions - Database Systems | CS 411 and more Assignments Deductive Database Systems in PDF only on Docsity!

CS411 Database Systems

Fall 2004

HW#4 Solution

Problem 1 Index-Based Join and Sort-Merge Join Algorithms

(a) 4, 7, 5, 2, 2, 8

(b) 2, 2, 4, 5, 7, 8 or 8, 7, 5, 4, 2, 2

Problem 2 Hybrid Hash Join Algorithm

To simplify the problem, we make the following assumptions:

Two-pass hash-join and Hybrid hash-join:

1.1 Among the two requirements, ”use as few buckets as possible” and ”read and write as many blocks as we can to consecutive positions on disk”, satisfy ”use as few buckets as possible” first.

1.2 In the first pass there is only one buffer block for each bucket (therefore the blocks are written to disks individually), while multiple blocks can be assigned for input buffer (therefore the blocks can be read from tables consecutively). Note that in Hybrid Hash-join, you always keep one whole bucket in memory and one buffer block for each of the other buckets.

Sort-based join

2.1 Use sort-merge-join in Section 15.4.7, i.e., the second phase of the sorting is combined with the join itself.

2.2 We only write sorted sublists consecutively. We do not read sorted sublists consecutively.

All

No need to consider clever scheme such as dynamic expanding/shrinking of input buffer and hash bucket buffer.

You can make your own assumptions as long as your assumptions are consistent with the conditions in problem (e.g., you can assume multiple blocks are used for each bucket in the first-phase, you can use the sort-based join algorithm in Section 15.4.5, you can read sorted sublists consecutively as well), and your answer is conistent with your assumptions.

(a) In order to use as few buckets as possible and still be able to fit one whole bucket of the smaller relation (S) into memory, the number of buckets should be 5. Therefore each buckts contain 200 blocks (R) and 100 blocks (S).

Therefore in the first pass, we can read 96 consecutive blocks from disk into memory.

Reading S while hashing into buckets: (100 + 96/2) ∗ 5 + 100 + 20/2 = 850. (Explanation: It can read 96 consecutive blocks from S for 5 times, then finally 20 consecutive blocks from S to finish it.)

Reading R while hashing into buckets: (100 + 96/2) ∗ 10 + 100 + 40/2 = 1600. (Explanation: It can reads 96 consecutive blocks from R for 10 times, then finally 40 consecutive blocks from R to

finish it.)

Writing hash buckets of S into disk: 100. 5 ∗ 500 = 50250.

Writing hash buckets of R into disk: 100. 5 ∗ 1000 = 100500.

In the second pass, join is performed bucket by bucket. For each bucket, all blocks of S are read into buffers and in-memory search structure is built. Then corresponding blocks of R are read block by block.

Reading hash buckets of S into memory while joining corresponding buckets: (100+100/2)∗5 = 750.

Reading hash buckets of R into memory while joining corresponding buckets: 100. 5 ∗ 100 = 100500.

Therefore the total time is 254450 (milliseconds).

(b)

As analyzed in Example 15.10, k = 6. When hashing S in the first pass, we have 5 buffers for 5 buckets, and 500/6=84 (83 is also fine answer) buffers for the 1 in-memory bucket. We have other 12 buffers for holding the input blocks when reading the relations. When hashing R, the 84 buffers for the in-memory bucket of S are always kept, and 5 buffers are for other 5 buckets. Therefore each time we can also read 12 blocks consectuvely.

During the second pass, we can throw the in-memory buckets. Therefore we can read each rest bucket of S consecutively. When reading corresponding bucket from R, we can consecutively read 101-84=17 blocks each time.

Reading S while hashing into buckets: (100 ∗ 12 /2) ∗ 41 + 100 + 8/2 = 4450.

Reading R while hashing into buckets: (100 + 12/2) ∗ 83 + 100 + 4/2 = 8900.

Writing 5 of the 6 hash buckets of S into disk: 100. 5 ∗ 416 = 41808.

Writing 5 of the 6 hash buckets of R into disk: 100. 5 ∗ 833 = 83716.5.

Reading hash buckets of S into memory while joining corresponding buckets: (100+84/2)∗5 = 705.

Reading hash buckets of R into memory while joining corresponding buckets: (100 + 17/2) ∗ 49 = 5316 .5.

Therefore the total time is 144896 (milliseconds).

(c)

In the first phase of two-phase multiway merge sort, we can sort 100 (or 101) blocks each time, therefore there are 10 sublists of R and 5 sublists of S.

reading R: (100 + 100/2) ∗ 10 = 1500.

reading S: (100 + 100/2) ∗ 5 = 750.

writing sorted sublists of R: (100 + 100/2) ∗ 10 = 1500.

writing sorted sublists of S: (100 + 100/2) ∗ 5 = 750.

The combined step of join and the second phase of sorting.

reading R: 100. 5 ∗ 1000 = 100500.

reading S: 100. 5 ∗ 500 = 50250.

(c) {R} {S} {T} {U} size 250 250 2000 2000 cost 0 0 0 0 best plan R S T U {R,S} {R,T} {R,U} {S,T} {S,U} {T,U} size 3125 500000 4000/3 4000/3 500000 32000/ cost 0 0 0 0 0 0 best plan R ./ S or S ./ R R ./ T R ./ U S ./ T S ./ U T ./ U or U ./ T {R,S,T} {R,S,U} {R,T,U} {S,T,U} size 50000/3 50000/3 64000/9 64000/ cost 4000/3 4000/3 4000/3 4000/ best plan (S ./ T ) ./ R (R ./ U ) ./ S (R ./ U ) ./ T (S ./ T ) ./ U {R,S,T,U} size 6400/ cost 4000/3 + 64000/ best plan ((S ./ T ) ./ U ) ./ R or((R ./ U ) ./ T ) ./ S

The costs of plans for 4-relation joins are:

Grouping Cost ((S ./ T ) ./ U ) ./ R 4000/3 + 64000/ ((R ./ U ) ./ T ) ./ S 4000/3 + 64000/ ((R ./ U ) ./ S) ./ T 4000/3 + 50000/ ((S ./ T ) ./ R) ./ U 4000/3 + 50000/

Problem 5 Cost Estimation

R has 25 different b values, therefore the 35 tuples of R with unknown b-values are divided among 20 values, for an average of 1.75 tuples each. Similarly the 40 tuples of S with unknown b-values are divided among 20 values, for an average of 2 tuples each.

According to containment of value sets, R and S contain the same set of b values since V (R, b) = V (S, b) = 25.

Number of join tuples with b = 0 : 4 × 9 = 36 Number of join tuples with b = 1 : 7 × 7 = 49 Number of join tuples with b = 2 : 6 × 8 = 48 Number of join tuples with b = 3 : 5 × 5 = 25 Number of join tuples with b = 4 : 3 × 2 = 6 Number of join tuples with b = 5 : 1. 75 × 6 = 10. 5 Number of join tuples with other 19 unknown b-values: 19 × 1. 75 × 2 = 66. 5

Therefore, totally the estimated join size is 36 + 49 + 48 + 25 + 6 + 10.5 + 66.5 = 241

(2)Assuming that all 25 values are equally likely to occur, the estimated join size is T (R)T (S)/max(V (R, b), V (S, b)) = 60 × 75 /25 = 180.

Problem 6 Transactions a)

Action t1 t2 Mem A Mem B Disk A Disk B READ(A,t1) 5 5 5 10 READ(B,t2) 5 10 5 10 5 10 t1=t1+t2 15 10 5 10 5 10 WRITE(A,t1) 15 10 15 10 5 10 READ(A,t1) 15 10 15 10 5 10 READ(B,t2) 15 10 15 10 5 10 t2=t1+t2 15 25 15 10 5 10 WRITE(B,t2) 15 25 15 25 5 10 OUTPUT(B) 15 25 15 25 5 25 OUTPUT(A) 15 25 15 25 15 25

Yes, the consistency can be preserved by scheduling OUTPUT(B) before OUTPUT(A).

b) Action t1 t2 Mem A Mem B Disk A Disk B READ(A,t1) 5 5 5 10 READ(B,t2) 5 10 5 10 5 10 t2=t1+t2 5 15 5 10 5 10 WRITE(B,t2) 5 15 5 15 5 10 READ(A,t1) 5 15 5 15 5 10 READ(B,t2) 5 15 5 15 5 10 t1=t1+t2 20 15 5 15 5 10 WRITE(A,t1) 20 15 20 15 5 10 OUTPUT(B) 20 15 20 15 5 15 OUTPUT(A) 20 15 20 15 20 15

No, it is impossible to preserve the consistency since A >= B if transactions finish successfully.

c) Action t Mem A Mem B Disk A Disk B READ(B,t) 10 10 5 10 t=t+1 11 10 5 10 WRITE(A,t) 11 11 10 5 10 READ(A,t) 11 11 10 5 10 t=t+1 12 11 10 5 10 WRITE(B,t) 12 11 12 5 10 OUTPUT(B) 12 11 12 5 12 OUTPUT(A) 12 11 12 11 12

Yes, the consistency can be preserved by scheduling OUTPUT(B) before OUTPUT(A).

Problem 7 Failure Recovery

(a) A=21, B=15, C=25, D=

(b) A=22, B=16 or 17, C=25 or 26 or 27 or 28, D= 43 or 44 or 45

The only values that couldn’t appear over here are A=21, B=15, because the values A=22, B= were flushed to disk before the end checkpoint in line 12 was placed onto the log.

(c) 3, 〈T1 start〉

Homework Solutions - Database Systems | CS 411, Assignments of Deductive Database Systems

Related documents

Partial preview of the text

Download Homework Solutions - Database Systems | CS 411 and more Assignments Deductive Database Systems in PDF only on Docsity!

CS411 Database Systems

Fall 2004

HW#4 Solution

Problem 1 Index-Based Join and Sort-Merge Join Algorithms

Problem 2 Hybrid Hash Join Algorithm

Problem 5 Cost Estimation

Problem 6 Transactions a)

Problem 7 Failure Recovery