Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

homework 3 homework 3 solution, Exercises of Computer Science

Georgia Institute of Technology - Main Campus Computer Science

homework 3 solution homework 3 solution homework 3 solution homework 3 solution homework 3 solution

Typology: Exercises

2022/2023

Uploaded on 12/01/2023

duanduan-zoey-zhou 🇺🇸

1 document

1 / 10

This page cannot be seen from the preview

Don't miss anything!

CARNEGIE MEL LO N UNIVERSITY

COMPUTER SCIENCE DEPARTMENT

15-445/645 – DATABA SE SY ST EM S (FALL 2023)

PROF. ANDY PAVLO A ND JI GN ES H PATEL

Homework #3 (by Anurag Choudhary) – Solutions

Due: Sunday, Oct 08, 2023 @ 11:59pm

IMPORTANT:

• Enter all of your answers into Gradescope by 11:59pm on Sunday, Oct 08, 2023.

•Plagiarism: Homework may be discussed with other students, but all homework is to be

completed individually.

For your information:

• Graded out of 100 points; 3questions total

• Rough time estimate: ≈1 - 2 hours (0.5 - 1 hours for each question)

Revision :2023/10/07 23:00

Question Points Score

Sorting Algorithms 36

Join Algorithms 43

Bloom Filters 21

Total: 100

Discover Exercises of Computer Science Georgia Institute of Technology - Main Campus

Partial preview of the text

Download homework 3 homework 3 solution and more Exercises Computer Science in PDF only on Docsity!

CARNEGIE MELLON UNIVERSITY

COMPUTER SCIENCE DEPARTMENT

15-445/645 – DATABASE SYSTEMS (FALL 2023)

PROF. ANDY PAVLO AND JIGNESH PATEL

Homework #3 (by Anurag Choudhary) – Solutions

Due: Sunday, Oct 08, 2023 @ 11:59pm

IMPORTANT:

Enter all of your answers into Gradescope by 11:59pm on Sunday, Oct 08, 2023.
Plagiarism: Homework may be discussed with other students, but all homework is to be completed individually. For your information:
Graded out of 100 points; 3 questions total
Rough time estimate: ≈ 1 - 2 hours (0.5 - 1 hours for each question) Revision : 2023/10/07 23:

Question Points Score Sorting Algorithms 36 Join Algorithms 43 Bloom Filters 21 Total: 100

Question 1: Sorting Algorithms............................... [36 points]

Graded by: We have a database file with nine million pages (N = 9,000,000 pages), and we want to sort it using external merge sort. Assume that the DBMS is not using double buffering or blocked I/O, and that it uses quicksort for in-memory sorting. Let B denote the number of buffers. (a) [6 points] Assume that the DBMS has fifty buffers. How many sorted runs are gener- ated? Note that the final sorted file does not count towards the sorted run count. 2 183,673 2 183,674 2 183,750 ■ 183,751 2 187,501 2 187, Solution: 9 , 000 , 000 50

(b) [6 points] Again, assuming that the DBMS has fifty buffers. How many pages (in gen- eral) does each sorted run have after the third pass (i.e. Note: this is Pass #2 if you start counting from Pass #0)? 2 49 2 50 2 2,450 2 2,451 2 2,452 ■ 120,050 2 120, 2 120, Solution: On the first pass, B buffer pages will be used to create the sorted runs. From the second pass onward, B-1 runs will be sorted through a K-way merge. First pass: 50 pages for each sorted run. Second pass: 50 ∗ 49 pages for each sorted run. Third pass: 50 ∗ 49 ∗ 49 = 120, 050 pages for each sorted run.

(c) [6 points] Again, assuming that the DBMS has fifty buffers. How many passes does the DBMS need to perform in order to sort the file? 2 1 2 2 2 3 2 4 ■ 5 Solution:

1 +

logB− 1

N

B

= 1 + ⌈log 49 (⌈ 9 , 000 , 000 / 50 ⌉)⌉

= 1 + 4 = 5

(d) [6 points] Suppose the DBMS has 100 buffers. What is the total I/O cost to sort the file? 2 18,000,000 2 36,000,000 2 54,000,000 ■ 72,000,000 2 90,000,

Solution: #P asses = 1 + ⌈log 99 (⌈ 9 , 000 , 000 / 100 ⌉)⌉ = 1 + 3 = 4 Cost = 2N × #P asses = 2 × 9 , 000 , 000 × 4 = 72, 000 , 000

Question 1 continues...

Question 2: Join Algorithms.................................. [43 points]

Graded by: Consider relations X(a, c), Y(a, b, e), and Z(a, d, f) to be joined on the common attribute a. Assume that there are no indexes available on the tables to speed up the join algorithms.

There are B = 750 pages in the buffer
Table X spans M = 2,500 pages with 80 tuples per page
Table Y spans N = 600 pages with 300 tuples per page
Table Z spans O = 1,500 pages with 150 tuples per page
The join result of X and Y spans P = 500 pages

For the following questions, assume a simple cost model where pages are read and written one at a time. Also assume that one buffer block is needed for the evolving output block and one input block is needed for the current input block of the inner relation. You may ignore the cost of the writing of the final results.

(a) [3 points] What is the I/O cost of a simple nested loop join with X as the outer relation and Y as the inner relation? 2 50,500 2 200,600 2 202,500 2 1,502,500 2 120,000, ■ 120,002,500 2 300,002, Solution: M + m × N = 2500 + 2500 × 80 × 600 = 120, 002 , 500

(b) [3 points] What is the I/O cost of a block nested loop join with Z as the outer relation and Y as the inner relation? 2 1,500 2 2,100 2 2,700 2 3,000 ■ 3,300 2 3,600 2 3, 2 5,

Solution: O + ⌈ (^) BO− 2 ⌉ × N = 1, 500 + ⌈^1 , 748500 ⌉ × 600 = 1, 500 + 1, 800 = 3, 300

(c) [3 points] What is the I/O cost of a block nested loop join with Y as the outer relation and Z as the inner relation? 2 1,500 ■ 2,100 2 2,700 2 3,000 2 3,300 2 3,600 2 3, 2 5,

Solution: N + ⌈ (^) BN− 2 ⌉ × O = 600 + ⌈^600748 ⌉ × 1 , 500 = 600 + 1, 500 = 2, 100

(d) For a sort-merge join with X as the outer relation and Z as the inner relation: i. [3 points] What is the cost of sorting the tuples in X on attribute a? 2 2,000 2 4,000 2 6,000 2 8,000 ■ 10,000 2 12,

Question 2 continues...

Solution: passes = 1 + ⌈logB− 1 (⌈MB ⌉)⌉ = 1 + ⌈log 749 (⌈^2500750 ⌉)⌉ = 1 + 1 = 2 2 M × passes = 2 ∗ 2500 ∗ 2 = 10, 000

ii. [3 points] What is the cost of sorting the tuples in Z on attribute a? 2 2,000 2 4,000 ■ 6,000 2 8,000 2 10,000 2 12, Solution: passes = 1 + ⌈logB− 1 (⌈ (^) BO ⌉)⌉ = 1 + ⌈log 749 (⌈^1750 ,^500 ⌉)⌉ = 1 + 1 = 2 2 O × passes = 2 ∗ 1 , 500 ∗ 2 = 6, 000

iii. [3 points] What is the cost of the merge phase in the worst-case scenario? 2 1,500 2 2,000 2 2,500 2 4,000 2 900,000 2 2,000, 2 3,250,000 ■ 3,750,000 2 4,500,000 2 5,000, Solution: M × O = 2, 500 × 1 , 500 = 3, 750 , 000

iv. [3 points] What is the cost of the merge phase assuming there are no duplicates in the join attribute? 2 1,500 2 2,000 2 2,500 ■ 4,000 2 900,000 2 2,000, 2 3,250,000 2 3,750,000 2 4,500,000 2 5,000, Solution: M + O = 2, 500 + 1, 500 = 4, 000

v. [3 points] Now consider joining X, Y and then joining the result with Z. What is the cost of the final merge phase assuming there are no duplicates in the join attribute? 2 1,000 ■ 2,000 2 3,000 2 5,000 2 1,000, Solution: P + O = 500 + 1, 500 = 2, 000

(e) Consider a hash join with Y as the outer relation and Z as the inner relation. You may ignore recursive partitioning and partially filled blocks. i. [3 points] What is the cost of the probe phase? 2 1,000 2 2,000 ■ 2,100 2 4,200 2 6,400 2 7,200 2 10, Solution: (N + O) = (600 + 1, 500) = 2, 100

ii. [3 points] What is the cost of the partition phase? 2 1,000 2 2,000 2 2,100 ■ 4,200 2 6,400 2 7,200 2 10, Solution: 2 × (N + O) = 2 × (600 + 1, 500) = 2 × 2 , 100 = 4, 200

(f) [3 points] Assume that the tables do not fit in main memory and that a large number of distinct values hash to the same bucket using hash function h 1. Which of the following approaches works the best? 2 Create hashtables for the inner and outer relation using h 1 and rehash into an embedded

Question 2 continues...

sizes. 2 True ■ False

Solution: The choice of the inner/outer tables would be affected by which table has the index on the join attribute. If both tables had indices, then their sizes would have been a factor.

Homework #3 continues...

Question 3: Bloom Filters..................................... [21 points]

Graded by: Assume that we have a bloom filter that is used to register names. The filter uses two hash functions h 1 and h 2 which hash the following strings to the following values:

input h 1 h 2 “Nas” 1973 1994 “Cole” 1985 2014 “Kendrick” 1987 2015 “Rocky” 1988 2013 “JID” 1990 2022 “Denzel” 1995 2018

(a) [4 points] Suppose the filter has 6 bits initially set to 0:

bit 0 bit 1 bit 2 bit 3 bit 4 bit 5 0 0 0 0 0 0

Which bits will be set to 1 after “Cole” and “JID” have been inserted? ■ 0 2 1 2 2 2 3 ■ 4 ■ 5 Solution: Because the filter has 6 bits, we take the modulo of the hashed output and 6. “Cole”: 1985 mod 6 = 5 2014 mod 6 = 4 “JID”: 1990 mod 6 = 4 2022 mod 6 = 0

(b) [3 points] If “Denzel” is inserted next, which bits will now be set to 1 (including those already set in part (a))? ■ 0 2 1 ■ 2 ■ 3 ■ 4 ■ 5 Solution: 1995 mod 6 = 3 2018 mod 6 = 2 Bits 0, 2, 3, 4, 5 are now set.

(c) [3 points] What will the filter return if we now lookup “Nas”? ■ True 2 False Solution: 1973 mod 6 = 5 1994 mod 6 = 2 Because bit 2 and bit 5 are both 1, the filter returns true.

Question 3 continues...

(f) [5 points] Suppose the filter has 6 bits set to the following values:

bit 0 bit 1 bit 2 bit 3 bit 4 bit 5 0 0 1 1 0 1

Which names are guaranteed to have NOT been inserted? 2 “Nas” ■ “Cole” ■ “Kendrick” 2 “Rocky” ■ “JID” 2 “Denzel”

Solution: The bits that must be set for each name are:

input first bit second bit inserted? “Nas” 5 2 Maybe “Cole” 5 4 No “Kendrick” 1 5 No “Rocky” 2 3 Maybe “JID” 4 0 No “Denzel” 3 2 Maybe “Nas” might have been inserted since both bit 2 and bit 5 are set (in this case - definitely been inserted). One or both of “Rocky” and “Denzel” have been inserted. We cannot guarantee that either has not been inserted.

End of Homework #

homework 3 homework 3 solution, Exercises of Computer Science

Related documents

Partial preview of the text

Download homework 3 homework 3 solution and more Exercises Computer Science in PDF only on Docsity!

CARNEGIE MELLON UNIVERSITY

COMPUTER SCIENCE DEPARTMENT

15-445/645 – DATABASE SYSTEMS (FALL 2023)

PROF. ANDY PAVLO AND JIGNESH PATEL

Homework #3 (by Anurag Choudhary) – Solutions

Due: Sunday, Oct 08, 2023 @ 11:59pm

IMPORTANT:

Question 1: Sorting Algorithms............................... [36 points]

N

B

Question 2: Join Algorithms.................................. [43 points]

Question 3: Bloom Filters..................................... [21 points]