homework 3 homework 3 solution, Exercises of Computer Science

homework 3 solution homework 3 solution homework 3 solution homework 3 solution homework 3 solution

Typology: Exercises

2022/2023

Uploaded on 12/01/2023

duanduan-zoey-zhou
duanduan-zoey-zhou 🇺🇸

1 document

1 / 10

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
CARNEGIE MEL LO N UNIVERSITY
COMPUTER SCIENCE DEPARTMENT
15-445/645 DATABA SE SY ST EM S (FALL 2023)
PROF. ANDY PAVLO A ND JI GN ES H PATEL
Homework #3 (by Anurag Choudhary) Solutions
Due: Sunday, Oct 08, 2023 @ 11:59pm
IMPORTANT:
Enter all of your answers into Gradescope by 11:59pm on Sunday, Oct 08, 2023.
Plagiarism: Homework may be discussed with other students, but all homework is to be
completed individually.
For your information:
Graded out of 100 points; 3questions total
Rough time estimate: 1 - 2 hours (0.5 - 1 hours for each question)
Revision :2023/10/07 23:00
Question Points Score
Sorting Algorithms 36
Join Algorithms 43
Bloom Filters 21
Total: 100
1
pf3
pf4
pf5
pf8
pf9
pfa

Partial preview of the text

Download homework 3 homework 3 solution and more Exercises Computer Science in PDF only on Docsity!

CARNEGIE MELLON UNIVERSITY

COMPUTER SCIENCE DEPARTMENT

15-445/645 – DATABASE SYSTEMS (FALL 2023)

PROF. ANDY PAVLO AND JIGNESH PATEL

Homework #3 (by Anurag Choudhary) – Solutions

Due: Sunday, Oct 08, 2023 @ 11:59pm

IMPORTANT:

  • Enter all of your answers into Gradescope by 11:59pm on Sunday, Oct 08, 2023.
  • Plagiarism: Homework may be discussed with other students, but all homework is to be completed individually. For your information:
  • Graded out of 100 points; 3 questions total
  • Rough time estimate: ≈ 1 - 2 hours (0.5 - 1 hours for each question) Revision : 2023/10/07 23:

Question Points Score Sorting Algorithms 36 Join Algorithms 43 Bloom Filters 21 Total: 100

Question 1: Sorting Algorithms............................... [36 points]

Graded by: We have a database file with nine million pages (N = 9,000,000 pages), and we want to sort it using external merge sort. Assume that the DBMS is not using double buffering or blocked I/O, and that it uses quicksort for in-memory sorting. Let B denote the number of buffers. (a) [6 points] Assume that the DBMS has fifty buffers. How many sorted runs are gener- ated? Note that the final sorted file does not count towards the sorted run count. 2 183,673 2 183,674 2 183,750 ■ 183,751 2 187,501 2 187, Solution:  9 , 000 , 000 50

(b) [6 points] Again, assuming that the DBMS has fifty buffers. How many pages (in gen- eral) does each sorted run have after the third pass (i.e. Note: this is Pass #2 if you start counting from Pass #0)? 2 49 2 50 2 2,450 2 2,451 2 2,452 ■ 120,050 2 120, 2 120, Solution: On the first pass, B buffer pages will be used to create the sorted runs. From the second pass onward, B-1 runs will be sorted through a K-way merge. First pass: 50 pages for each sorted run. Second pass: 50 ∗ 49 pages for each sorted run. Third pass: 50 ∗ 49 ∗ 49 = 120, 050 pages for each sorted run.

(c) [6 points] Again, assuming that the DBMS has fifty buffers. How many passes does the DBMS need to perform in order to sort the file? 2 1 2 2 2 3 2 4 ■ 5 Solution:

1 +

logB− 1

N

B

= 1 + ⌈log 49 (⌈ 9 , 000 , 000 / 50 ⌉)⌉

= 1 + 4 = 5

(d) [6 points] Suppose the DBMS has 100 buffers. What is the total I/O cost to sort the file? 2 18,000,000 2 36,000,000 2 54,000,000 ■ 72,000,000 2 90,000,

Solution: #P asses = 1 + ⌈log 99 (⌈ 9 , 000 , 000 / 100 ⌉)⌉ = 1 + 3 = 4 Cost = 2N × #P asses = 2 × 9 , 000 , 000 × 4 = 72, 000 , 000

Question 1 continues...

Question 2: Join Algorithms.................................. [43 points]

Graded by: Consider relations X(a, c), Y(a, b, e), and Z(a, d, f) to be joined on the common attribute a. Assume that there are no indexes available on the tables to speed up the join algorithms.

  • There are B = 750 pages in the buffer
  • Table X spans M = 2,500 pages with 80 tuples per page
  • Table Y spans N = 600 pages with 300 tuples per page
  • Table Z spans O = 1,500 pages with 150 tuples per page
  • The join result of X and Y spans P = 500 pages

For the following questions, assume a simple cost model where pages are read and written one at a time. Also assume that one buffer block is needed for the evolving output block and one input block is needed for the current input block of the inner relation. You may ignore the cost of the writing of the final results.

(a) [3 points] What is the I/O cost of a simple nested loop join with X as the outer relation and Y as the inner relation? 2 50,500 2 200,600 2 202,500 2 1,502,500 2 120,000, ■ 120,002,500 2 300,002, Solution: M + m × N = 2500 + 2500 × 80 × 600 = 120, 002 , 500

(b) [3 points] What is the I/O cost of a block nested loop join with Z as the outer relation and Y as the inner relation? 2 1,500 2 2,100 2 2,700 2 3,000 ■ 3,300 2 3,600 2 3, 2 5,

Solution: O + ⌈ (^) BO− 2 ⌉ × N = 1, 500 + ⌈^1 , 748500 ⌉ × 600 = 1, 500 + 1, 800 = 3, 300

(c) [3 points] What is the I/O cost of a block nested loop join with Y as the outer relation and Z as the inner relation? 2 1,500 ■ 2,100 2 2,700 2 3,000 2 3,300 2 3,600 2 3, 2 5,

Solution: N + ⌈ (^) BN− 2 ⌉ × O = 600 + ⌈^600748 ⌉ × 1 , 500 = 600 + 1, 500 = 2, 100

(d) For a sort-merge join with X as the outer relation and Z as the inner relation: i. [3 points] What is the cost of sorting the tuples in X on attribute a? 2 2,000 2 4,000 2 6,000 2 8,000 ■ 10,000 2 12,

Question 2 continues...

Solution: passes = 1 + ⌈logB− 1 (⌈MB ⌉)⌉ = 1 + ⌈log 749 (⌈^2500750 ⌉)⌉ = 1 + 1 = 2 2 M × passes = 2 ∗ 2500 ∗ 2 = 10, 000

ii. [3 points] What is the cost of sorting the tuples in Z on attribute a? 2 2,000 2 4,000 ■ 6,000 2 8,000 2 10,000 2 12, Solution: passes = 1 + ⌈logB− 1 (⌈ (^) BO ⌉)⌉ = 1 + ⌈log 749 (⌈^1750 ,^500 ⌉)⌉ = 1 + 1 = 2 2 O × passes = 2 ∗ 1 , 500 ∗ 2 = 6, 000

iii. [3 points] What is the cost of the merge phase in the worst-case scenario? 2 1,500 2 2,000 2 2,500 2 4,000 2 900,000 2 2,000, 2 3,250,000 ■ 3,750,000 2 4,500,000 2 5,000, Solution: M × O = 2, 500 × 1 , 500 = 3, 750 , 000

iv. [3 points] What is the cost of the merge phase assuming there are no duplicates in the join attribute? 2 1,500 2 2,000 2 2,500 ■ 4,000 2 900,000 2 2,000, 2 3,250,000 2 3,750,000 2 4,500,000 2 5,000, Solution: M + O = 2, 500 + 1, 500 = 4, 000

v. [3 points] Now consider joining X, Y and then joining the result with Z. What is the cost of the final merge phase assuming there are no duplicates in the join attribute? 2 1,000 ■ 2,000 2 3,000 2 5,000 2 1,000, Solution: P + O = 500 + 1, 500 = 2, 000

(e) Consider a hash join with Y as the outer relation and Z as the inner relation. You may ignore recursive partitioning and partially filled blocks. i. [3 points] What is the cost of the probe phase? 2 1,000 2 2,000 ■ 2,100 2 4,200 2 6,400 2 7,200 2 10, Solution: (N + O) = (600 + 1, 500) = 2, 100

ii. [3 points] What is the cost of the partition phase? 2 1,000 2 2,000 2 2,100 ■ 4,200 2 6,400 2 7,200 2 10, Solution: 2 × (N + O) = 2 × (600 + 1, 500) = 2 × 2 , 100 = 4, 200

(f) [3 points] Assume that the tables do not fit in main memory and that a large number of distinct values hash to the same bucket using hash function h 1. Which of the following approaches works the best? 2 Create hashtables for the inner and outer relation using h 1 and rehash into an embedded

Question 2 continues...

sizes. 2 True ■ False

Solution: The choice of the inner/outer tables would be affected by which table has the index on the join attribute. If both tables had indices, then their sizes would have been a factor.

Homework #3 continues...

Question 3: Bloom Filters..................................... [21 points]

Graded by: Assume that we have a bloom filter that is used to register names. The filter uses two hash functions h 1 and h 2 which hash the following strings to the following values:

input h 1 h 2 “Nas” 1973 1994 “Cole” 1985 2014 “Kendrick” 1987 2015 “Rocky” 1988 2013 “JID” 1990 2022 “Denzel” 1995 2018

(a) [4 points] Suppose the filter has 6 bits initially set to 0:

bit 0 bit 1 bit 2 bit 3 bit 4 bit 5 0 0 0 0 0 0

Which bits will be set to 1 after “Cole” and “JID” have been inserted? ■ 0 2 1 2 2 2 3 ■ 4 ■ 5 Solution: Because the filter has 6 bits, we take the modulo of the hashed output and 6. “Cole”: 1985 mod 6 = 5 2014 mod 6 = 4 “JID”: 1990 mod 6 = 4 2022 mod 6 = 0

(b) [3 points] If “Denzel” is inserted next, which bits will now be set to 1 (including those already set in part (a))? ■ 0 2 1 ■ 2 ■ 3 ■ 4 ■ 5 Solution: 1995 mod 6 = 3 2018 mod 6 = 2 Bits 0, 2, 3, 4, 5 are now set.

(c) [3 points] What will the filter return if we now lookup “Nas”? ■ True 2 False Solution: 1973 mod 6 = 5 1994 mod 6 = 2 Because bit 2 and bit 5 are both 1, the filter returns true.

Question 3 continues...

(f) [5 points] Suppose the filter has 6 bits set to the following values:

bit 0 bit 1 bit 2 bit 3 bit 4 bit 5 0 0 1 1 0 1

Which names are guaranteed to have NOT been inserted? 2 “Nas” ■ “Cole” ■ “Kendrick” 2 “Rocky” ■ “JID” 2 “Denzel”

Solution: The bits that must be set for each name are:

input first bit second bit inserted? “Nas” 5 2 Maybe “Cole” 5 4 No “Kendrick” 1 5 No “Rocky” 2 3 Maybe “JID” 4 0 No “Denzel” 3 2 Maybe “Nas” might have been inserted since both bit 2 and bit 5 are set (in this case - definitely been inserted). One or both of “Rocky” and “Denzel” have been inserted. We cannot guarantee that either has not been inserted.

End of Homework #