Database Systems Homework 4: Query Execution and Cost Estimation, Assignments of Deductive Database Systems

Solutions to problem 1-4 of cs411 database systems fall 2005 homework 4. The problems cover query execution using block-nested loop join and b+ tree index, merge sort, and cost estimation. The document also includes answers to algebraic laws and dynamic programming subproblems.

Typology: Assignments

Pre 2010

Uploaded on 03/16/2009

koofers-user-nm2
koofers-user-nm2 🇺🇸

2.5

(1)

10 documents

1 / 4

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
CS411 Database Systems
Fall 2005
HW #4
Due: 11:30am CST, 12/9/05
Note: Print your name and NetID in the upper right corner of every page of your submission.
Handin your homework to Donna Coleman in 2120 SC.
Problem 1 Query Execution 15 points, 5 points each subproblem
Consider the following two relations:
Students(name, sid, latest-course-id)
TakeCourses(sid, cid, cname)
where sid,cid,cname are student id, course id, and course name, respectively. sid is the key
of relation Students, and (sid,cid) form the key of TakeCourses.Students has 10,000 tuples,
with 25 tuples fitting on a block. TakeCourses has 5,000 tuples, with 50 tuples fitting on a block.
There is no index on any attribute of the relations.
(a) Suppose the memory buffer has 101 blocks, compute the cost of using a block-nested loop
join to join the above two relations.
(b) Suppose the memory buffer has 61 blocks. Compute the cost of using a block-nested loop
join to join the above two relations.
(c) Suppose we join the above two relations on attribute sid. Suppose that there is a B+ tree
index on sid for Students, with the fanout of 20 (i.e., each tree node points to 20 other nodes).
Compute the worst-case cost of doing a block-nested loop join that utilizes this index.
Answer:
(a)
B(S) = 10000
25 = 400
B(T) = 5000
50 = 100
Cost =B(T) + dB(T)
M1eB(S) = 100 + d100
101 1e400 = 500
(b)
Cost =B(T) + dB(T)
M1eB(S) = 100 + d100
61 1e400 = 100 + 800 = 900
(c) The worse case is when all the B+ tree indices are on disk. Given that there are 10000
tuples in Students, and that the fanout is 20, we need at least 4 levels in the B+ tree. (Level 1
1
pf3
pf4

Partial preview of the text

Download Database Systems Homework 4: Query Execution and Cost Estimation and more Assignments Deductive Database Systems in PDF only on Docsity!

CS411 Database Systems

Fall 2005

HW

Due: 11:30am CST, 12/9/

Note: Print your name and NetID in the upper right corner of every page of your submission. Handin your homework to Donna Coleman in 2120 SC.

Problem 1 Query Execution 15 points, 5 points each subproblem

Consider the following two relations:

Students(name, sid, latest-course-id) TakeCourses(sid, cid, cname) where sid, cid, cname are student id, course id, and course name, respectively. sid is the key of relation Students, and (sid,cid) form the key of TakeCourses. Students has 10,000 tuples, with 25 tuples fitting on a block. TakeCourses has 5,000 tuples, with 50 tuples fitting on a block. There is no index on any attribute of the relations. (a) Suppose the memory buffer has 101 blocks, compute the cost of using a block-nested loop join to join the above two relations. (b) Suppose the memory buffer has 61 blocks. Compute the cost of using a block-nested loop join to join the above two relations. (c) Suppose we join the above two relations on attribute sid. Suppose that there is a B+ tree index on sid for Students, with the fanout of 20 (i.e., each tree node points to 20 other nodes). Compute the worst-case cost of doing a block-nested loop join that utilizes this index. Answer: (a)

B(S) =

B(T ) =

Cost = B(T ) + d

B(T )

M − 1

eB(S) = 100 + d

e400 = 500

(b)

Cost = B(T ) + d

B(T )

M − 1

eB(S) = 100 + d

e400 = 100 + 800 = 900

(c) The worse case is when all the B+ tree indices are on disk. Given that there are 10000 tuples in Students, and that the fanout is 20, we need at least 4 levels in the B+ tree. (Level 1

can have at most 20 pointers, Level 2 can have at most 400 pointers, Level 3 can have at most 8000 pointers, Level 4 can have at most 160000 pointers). Therefore to get to a tuple in the students relation, we need 5 disk I/Os (4 for fetching blocks in B+ tree and 1 for fetching the data block). Therefore, the worse-case cost is 100+5000*5=25100.

Problem 2 Query Execution Cont 20 points

Relation R(x,y) occupies 1000 blocks, and relation S(x,z) occupies 900 blocks. First, we will use two-phase, multiply merge sort to sort relation R(x,y) and S(x,z). This will require 4(1000+900) = 7600 disk I/Os. Then will we will merge the sorted R and S. For x=1, 4, 5, we can apply the standard algorithm described on page 744, because the tuples in both relations taking these values can fit into the memory. This will induce 300 disk I/Os. For x=2, tuples in R can fit into the memory but tuples in S could not. Therefore, we will load all the tuples in R taking value x=2 into memory, and read the blocks of S taking value x=2 one at a time. This will induce 500 disk I/Os. Essentially using R as the outer relation in nested-loop join. For x=3, neither tuples in R nor tuples in S could fit into memory. Therefore, we will need to perform nested-loop join, which will induce 300 + 300 ∗ d 300 / 200 e = 900 disk I/Os. For x=6, tuples in S can fit into the memory but tuples in R could not. Therefore, we will load all the tuples in S taking value x=6 into memory, and read the blocks of R taking value x=6 one at a time. This will induce 500 disk I/Os. Essentially using S as the outer relation in nested-loop join. Therefore, altogether we need 7600 + 300 + 500 + 900 + 500 = 9800 disk I/Os.

Problem 3 Algebraic Laws 15 points, 5 points each subproblem

(a) False. Counter example, R = {x}, S = {x}. However, R∪S = {x, x} 6 = S (b) True. We know that for each x that appears in R, its frequency fR(x) is less or equal to the frequency fS (x) of it appearing in S. Therefore, in R∩S, the frequency of x will be fR(x). This guarantees that all the elements in R will appear in R∩S for the same number of times. Also, according to the semantics of intersection, any elements not in R won’t be in R∩S. These imply that R∩S = R (c) True. From R ⊆ S, We know that for each x that appears in R, its frequency fR(x) is less or equal to the frequency fS (x) of it appearing in S. Then for the same x, from S ⊆ R we could knot that its frequency fS (x) of it appearing in S is less or equal to the frequency fR(x) of it appearing in R. Therefore, we know fR(x)= fS (x). Also from the semantics of ⊆, all elements in R will appear in S, and vice versa. These imply that R and S have the same elements, and each element appear the same number of times, in other words R=S.

Problem 4 Dynamic Programming 10 points, 5 points each subproblem

(e) Estimate the number of tuples in σ(b>25)∧(b=11)(R) Ans: 0 (as conditions contradict) (f) Estimate the number of tuples in σ(b=25)∨(d=13)(R) Ans: T(R)/V(R,b) + T(R)/V(R,d) - T(R)/(V(R,b)*V(R,d))= (g) Estimate the number of tuples in R ./ S Ans: T(R)T(s)/max(V(R,d), V(S,d))= (h) Estimate the number of tuples in R ./R.d=S.d AN D R.b=S.e S Ans: T(R)T(s)/(max(V(R,d), V(S,d)) max(V(R,b), V(S,e)))=

Problem 6 Failure Recovery 16 points, 8 points for (a) and 8 points for (b) (notice, c)

and d) don’t need to be graded)

(a) Transactions that need to be undone: T5, T4, T2. The following actions needs to be undone (order is from left to right): <T5,D,60>, <T2,E,50>, <T4,F,70>, <T2,C,30>, <T2,B,20>, <T2,A,10>. (b) Transaction that needs to be undone: T5. The following actions needs to be undone (order is from left to right): <T5,F,90>, <T5,D,60> (c) Transactions that need to be redone: T1, T The following actions needs to be redone order is from left to right): <T1,A,60>, <T3,D,40> (d) Transactions that need to be redone: T2, T3, T The following actions needs to be redone order is from left to right): <T2,A,10>, <T2,B,20>, <T2,C,30>, <T3,D,40>, <T4,F,70>, <T2,E,50>, <T4,B,80>