






Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
homework 3 solution homework 3 solution homework 3 solution homework 3 solution homework 3 solution
Typology: Exercises
1 / 10
This page cannot be seen from the preview
Don't miss anything!







Question Points Score Sorting Algorithms 36 Join Algorithms 43 Bloom Filters 21 Total: 100
Graded by: We have a database file with nine million pages (N = 9,000,000 pages), and we want to sort it using external merge sort. Assume that the DBMS is not using double buffering or blocked I/O, and that it uses quicksort for in-memory sorting. Let B denote the number of buffers. (a) [6 points] Assume that the DBMS has fifty buffers. How many sorted runs are gener- ated? Note that the final sorted file does not count towards the sorted run count. 2 183,673 2 183,674 2 183,750 ■ 183,751 2 187,501 2 187, Solution: 9 , 000 , 000 50
(b) [6 points] Again, assuming that the DBMS has fifty buffers. How many pages (in gen- eral) does each sorted run have after the third pass (i.e. Note: this is Pass #2 if you start counting from Pass #0)? 2 49 2 50 2 2,450 2 2,451 2 2,452 ■ 120,050 2 120, 2 120, Solution: On the first pass, B buffer pages will be used to create the sorted runs. From the second pass onward, B-1 runs will be sorted through a K-way merge. First pass: 50 pages for each sorted run. Second pass: 50 ∗ 49 pages for each sorted run. Third pass: 50 ∗ 49 ∗ 49 = 120, 050 pages for each sorted run.
(c) [6 points] Again, assuming that the DBMS has fifty buffers. How many passes does the DBMS need to perform in order to sort the file? 2 1 2 2 2 3 2 4 ■ 5 Solution:
1 +
logB− 1
= 1 + ⌈log 49 (⌈ 9 , 000 , 000 / 50 ⌉)⌉
= 1 + 4 = 5
(d) [6 points] Suppose the DBMS has 100 buffers. What is the total I/O cost to sort the file? 2 18,000,000 2 36,000,000 2 54,000,000 ■ 72,000,000 2 90,000,
Solution: #P asses = 1 + ⌈log 99 (⌈ 9 , 000 , 000 / 100 ⌉)⌉ = 1 + 3 = 4 Cost = 2N × #P asses = 2 × 9 , 000 , 000 × 4 = 72, 000 , 000
Question 1 continues...
Graded by: Consider relations X(a, c), Y(a, b, e), and Z(a, d, f) to be joined on the common attribute a. Assume that there are no indexes available on the tables to speed up the join algorithms.
For the following questions, assume a simple cost model where pages are read and written one at a time. Also assume that one buffer block is needed for the evolving output block and one input block is needed for the current input block of the inner relation. You may ignore the cost of the writing of the final results.
(a) [3 points] What is the I/O cost of a simple nested loop join with X as the outer relation and Y as the inner relation? 2 50,500 2 200,600 2 202,500 2 1,502,500 2 120,000, ■ 120,002,500 2 300,002, Solution: M + m × N = 2500 + 2500 × 80 × 600 = 120, 002 , 500
(b) [3 points] What is the I/O cost of a block nested loop join with Z as the outer relation and Y as the inner relation? 2 1,500 2 2,100 2 2,700 2 3,000 ■ 3,300 2 3,600 2 3, 2 5,
Solution: O + ⌈ (^) BO− 2 ⌉ × N = 1, 500 + ⌈^1 , 748500 ⌉ × 600 = 1, 500 + 1, 800 = 3, 300
(c) [3 points] What is the I/O cost of a block nested loop join with Y as the outer relation and Z as the inner relation? 2 1,500 ■ 2,100 2 2,700 2 3,000 2 3,300 2 3,600 2 3, 2 5,
Solution: N + ⌈ (^) BN− 2 ⌉ × O = 600 + ⌈^600748 ⌉ × 1 , 500 = 600 + 1, 500 = 2, 100
(d) For a sort-merge join with X as the outer relation and Z as the inner relation: i. [3 points] What is the cost of sorting the tuples in X on attribute a? 2 2,000 2 4,000 2 6,000 2 8,000 ■ 10,000 2 12,
Question 2 continues...
Solution: passes = 1 + ⌈logB− 1 (⌈MB ⌉)⌉ = 1 + ⌈log 749 (⌈^2500750 ⌉)⌉ = 1 + 1 = 2 2 M × passes = 2 ∗ 2500 ∗ 2 = 10, 000
ii. [3 points] What is the cost of sorting the tuples in Z on attribute a? 2 2,000 2 4,000 ■ 6,000 2 8,000 2 10,000 2 12, Solution: passes = 1 + ⌈logB− 1 (⌈ (^) BO ⌉)⌉ = 1 + ⌈log 749 (⌈^1750 ,^500 ⌉)⌉ = 1 + 1 = 2 2 O × passes = 2 ∗ 1 , 500 ∗ 2 = 6, 000
iii. [3 points] What is the cost of the merge phase in the worst-case scenario? 2 1,500 2 2,000 2 2,500 2 4,000 2 900,000 2 2,000, 2 3,250,000 ■ 3,750,000 2 4,500,000 2 5,000, Solution: M × O = 2, 500 × 1 , 500 = 3, 750 , 000
iv. [3 points] What is the cost of the merge phase assuming there are no duplicates in the join attribute? 2 1,500 2 2,000 2 2,500 ■ 4,000 2 900,000 2 2,000, 2 3,250,000 2 3,750,000 2 4,500,000 2 5,000, Solution: M + O = 2, 500 + 1, 500 = 4, 000
v. [3 points] Now consider joining X, Y and then joining the result with Z. What is the cost of the final merge phase assuming there are no duplicates in the join attribute? 2 1,000 ■ 2,000 2 3,000 2 5,000 2 1,000, Solution: P + O = 500 + 1, 500 = 2, 000
(e) Consider a hash join with Y as the outer relation and Z as the inner relation. You may ignore recursive partitioning and partially filled blocks. i. [3 points] What is the cost of the probe phase? 2 1,000 2 2,000 ■ 2,100 2 4,200 2 6,400 2 7,200 2 10, Solution: (N + O) = (600 + 1, 500) = 2, 100
ii. [3 points] What is the cost of the partition phase? 2 1,000 2 2,000 2 2,100 ■ 4,200 2 6,400 2 7,200 2 10, Solution: 2 × (N + O) = 2 × (600 + 1, 500) = 2 × 2 , 100 = 4, 200
(f) [3 points] Assume that the tables do not fit in main memory and that a large number of distinct values hash to the same bucket using hash function h 1. Which of the following approaches works the best? 2 Create hashtables for the inner and outer relation using h 1 and rehash into an embedded
Question 2 continues...
sizes. 2 True ■ False
Solution: The choice of the inner/outer tables would be affected by which table has the index on the join attribute. If both tables had indices, then their sizes would have been a factor.
Homework #3 continues...
Graded by: Assume that we have a bloom filter that is used to register names. The filter uses two hash functions h 1 and h 2 which hash the following strings to the following values:
input h 1 h 2 “Nas” 1973 1994 “Cole” 1985 2014 “Kendrick” 1987 2015 “Rocky” 1988 2013 “JID” 1990 2022 “Denzel” 1995 2018
(a) [4 points] Suppose the filter has 6 bits initially set to 0:
bit 0 bit 1 bit 2 bit 3 bit 4 bit 5 0 0 0 0 0 0
Which bits will be set to 1 after “Cole” and “JID” have been inserted? ■ 0 2 1 2 2 2 3 ■ 4 ■ 5 Solution: Because the filter has 6 bits, we take the modulo of the hashed output and 6. “Cole”: 1985 mod 6 = 5 2014 mod 6 = 4 “JID”: 1990 mod 6 = 4 2022 mod 6 = 0
(b) [3 points] If “Denzel” is inserted next, which bits will now be set to 1 (including those already set in part (a))? ■ 0 2 1 ■ 2 ■ 3 ■ 4 ■ 5 Solution: 1995 mod 6 = 3 2018 mod 6 = 2 Bits 0, 2, 3, 4, 5 are now set.
(c) [3 points] What will the filter return if we now lookup “Nas”? ■ True 2 False Solution: 1973 mod 6 = 5 1994 mod 6 = 2 Because bit 2 and bit 5 are both 1, the filter returns true.
Question 3 continues...
(f) [5 points] Suppose the filter has 6 bits set to the following values:
bit 0 bit 1 bit 2 bit 3 bit 4 bit 5 0 0 1 1 0 1
Which names are guaranteed to have NOT been inserted? 2 “Nas” ■ “Cole” ■ “Kendrick” 2 “Rocky” ■ “JID” 2 “Denzel”
Solution: The bits that must be set for each name are:
input first bit second bit inserted? “Nas” 5 2 Maybe “Cole” 5 4 No “Kendrick” 1 5 No “Rocky” 2 3 Maybe “JID” 4 0 No “Denzel” 3 2 Maybe “Nas” might have been inserted since both bit 2 and bit 5 are set (in this case - definitely been inserted). One or both of “Rocky” and “Denzel” have been inserted. We cannot guarantee that either has not been inserted.
End of Homework #