Query Execution and Query Optimization - Database Systems | CS 411, Assignments of Deductive Database Systems

Material Type: Assignment; Class: Database Systems; Subject: Computer Science; University: University of Illinois - Urbana-Champaign; Term: Unknown 2006;

Typology: Assignments

Pre 2010

Uploaded on 03/16/2009

koofers-user-j71
koofers-user-j71 🇺🇸

9 documents

1 / 10

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Homework 4
Administrivia
Out: 11st April, 2006.
Due date: At the beginning of the lecture on 27th April, 2006. Please submit a hard copy
of your homework. Bring it down to the lecture table (the one with the PC that I use to
display the slides).
The hard copy should be as clearly readable as possible. You may be subtracted points
for unreadability and ugly presentation.
This homework will be counted as 1.5 homeworks. Thus, the total is 150 points.
Off-campus students: You should e-mail your solutions to Yoonkyong Lee
<[email protected]> in the pdf format or in the Word document. Send the file as
attachment with your email by 2 PM UIUC time (CST). Off-campus students in other
time zones should note that the deadline is according to CST.
Please note: Late homework will not be accepted, barring exceptional
circumstances.
Storage
Problem 1. (20 points) Consider the Megatron 757 disk with the following properties:
There are four platters providing eight surfaces.
There are 213 = 8192, tracks per surface.
There are (on average) 28 = 256 sectors per track.
There are 29 = 512 bytes per sector.
The disk rotates at 5400 RPM.
The block size is 212 = 4096 bytes
Assume 10% of each track is used as overhead.
The time it takes the head to move n tracks is (1 + n/500) milliseconds.
Suppose that we know that the last I/O request accessed track 2000.
a. What is the expected (average) number of tracks that will be traveled due to the
very next I/O request for a block on a random track to this disk?
b. What is the expected (average) block access time, including average seek time,
average rotational latency and transfer time, for the next I/O for a block on a
random track, again given that the head is on track 2000 initially?
pf3
pf4
pf5
pf8
pf9
pfa

Partial preview of the text

Download Query Execution and Query Optimization - Database Systems | CS 411 and more Assignments Deductive Database Systems in PDF only on Docsity!

Homework 4

Administrivia

Out: 11 st^ April, 2006. Due date: At the beginning of the lecture on 27 th^ April, 2006. Please submit a hard copy of your homework. Bring it down to the lecture table (the one with the PC that I use to display the slides).

The hard copy should be as clearly readable as possible. You may be subtracted points for unreadability and ugly presentation.

This homework will be counted as 1.5 homeworks. Thus, the total is 150 points.

Off-campus students: You should e-mail your solutions to Yoonkyong Lee in the pdf format or in the Word document. Send the file as attachment with your email by 2 PM UIUC time (CST). Off-campus students in other time zones should note that the deadline is according to CST.

Please note: Late homework will not be accepted, barring exceptional circumstances.

Storage

Problem 1. (20 points) Consider the Megatron 757 disk with the following properties:

  • There are four platters providing eight surfaces.
  • There are 2^13 = 8192, tracks per surface.
  • There are (on average) 2^8 = 256 sectors per track.
  • There are 2^9 = 512 bytes per sector.
  • The disk rotates at 5400 RPM.
  • The block size is 2^12 = 4096 bytes
  • Assume 10% of each track is used as overhead.
  • The time it takes the head to move n tracks is (1 + n /500) milliseconds.

Suppose that we know that the last I/O request accessed track 2000.

a. What is the expected (average) number of tracks that will be traveled due to the very next I/O request for a block on a random track to this disk? b. What is the expected (average) block access time, including average seek time, average rotational latency and transfer time, for the next I/O for a block on a random track, again given that the head is on track 2000 initially?

Answer:

a. At track 2000, the average number of tracks to move is (1999+1998+…+2+1+0+1+2+…+6191+6192)/8192 ≈ 2583.5 tracks

b. The average seek time will be: (1+2583.5/500) ≈ 6.167 milliseconds. Given that there are 256 sectors per track, there will also be 256 gaps per track. The disk rotates at 5400 rpm, meaning one rotation in 11.11 milliseconds. This implies that the average rotational latency will be 5.56 milliseconds. One block covers 8 tracks and 7 gaps. The total degrees of arc covered by a block is: (367/256+3248/256) ≈ 11.11. The transfer time is therefore (11.11/360)* 11.11 ≈ 0.34 milliseconds. Therefore the expected block access time is 6.167 + 5.56 + 0.34 ≈ 12. milliseconds.

Indexing

Problem 2. (20 points) Suppose each block holds either 20 records, or 50 key-pointer pairs. If a data file has 100,000 records, how many blocks do we need to hold this data file and:

(a) A dense index? (b) A sparse index?

Answer:

(a) 5000(=100000/20) blocks are needed for data file and 2000(=100000/50) for the index. In total, we need 7000 blocks.

(b) Again, 5000(=10000/20) blocks are needed for data file. But, it need only room for 5000 pointers in the index file; it needs 100 blocks for the index. In total, we need 5100 blocks.

(c) B+ tree

(d) None

Query Execution

Problem 4. (15 points) Suppose you have 2 relations, R(A, B, C) and S(B, C, D, E). You have a clustered unique (no duplicate keys) B+-tree index on attribute A for relation R. Assume this index is kept entirely in memory (i.e., you do not need to read it from disk).

For relation S, you have two indexes: (i) a non-clustered non-unique B+-tree index for attribute B, and (ii) a clustered non-unique B+-tree index for attribute C. Assume that these two indexes are also kept in memory. Also, assume that all of the tuples of S that agree on attribute C are stored in sequentially adjacent blocks on disk (that is, if more than one block is needed to store all of the tuples with some value of C, then these blocks will be sequentially located on the disk).

Other relevant data:

  • 2000 tuples of R are stored per block on disk.
  • T(R) = 2,000,000 (number of tuples of R)
  • 200 tuples of S are stored per block on disk.
  • T(S) = 200,000 (number of tuples of S)
  • V(B,S)=40,000 (number of distinct values of attribute B in S)
  • V(C,S)=800 (number of distinct values of attribute C in S)

You want to execute the following query: SELECT * FROM R, S WHERE (R.B=S.B) AND (R.C=S.C) We present you with two query plans:

Plan 1:

For every block BL of R, retrieved using the clustered index on A for R For every tuple r of BL Use the index on B for S to retrieve all of the tuples s of S such that s.B=r.B For each of these tuples s, if s.C=r.C, output r.A, r.B, r.C, s.B, s.C, s.D, s.E

Query Optimization

Problem 5. (20 points) Consider three relations R(A,B), S(B,C), T(C,D) with the following statistics:

  • T(R) = 200, V(A, R) = 50, V(B, R) = 100
  • T(S) = 300, V(B, S) = 50, V(C, S) = 50
  • T(T) = 400, V(C, T) = 40, V(D, T) = 100

T(R) represents the number of tuples of R, and V(A, R) represents the number of distinct values of attribute A in relation R. Estimate the sizes of relations that are the results of the following expressions: (a) select{B = 20}(S) join T_ (b) select{R.A < S.C}(R join S)_

Answer: (a) The size of select{B = 20}(S) is 300/50=_ The size of select{B = 20}(S) join T_ is therefore 6*400/50=48.

(b) The size of the join is 200*300/100=600. Therefore the size of select{R.A < S.C}(R join S)_ is 600/3 = 200. (Refer to Page 824).

Problem 6. (20 points) Exercise 16.6.2, page 858

Answer:

In the following tables we show all the possible plans.

Table for singleton sets:

{W} {X} {Y} {Z} Size 100 200 300 400 Cost 0 0 0 0 Best Plan W X Y Z

Table for pairs of relations

{W, X} {W, Y} {W, Z} {X,Y} {X, Z} {Y, Z} Size 333 30,000 400 600 80,000 2, Cost 0 0 0 0 0 0 Best Plan W X W Y W Z X Y X Z Y Z

Table for triples of relations

{W, X, Y} {W, X, Z} {W, Y, Z} {X, Y, Z} Size 1,000 1,333 2,400 4, Cost 333 333 400 600 Best Plan (W X) Y (W X) Z (W Z) Y (X Y) Z

All the possible join plans and costs:

Grouping Cost

((W X) Y) Z 1,

((W X) Z) Y 1,

((W Z) Y) X 2,

((X Y) Z) W 5,

(W X ) (Y Z) 2,

(W Y ) (X Z) 110,

(W Z ) (X Y) 1,

Therefore, if we consider all the trees, the best choice is the last plan in the above table, with cost 1,000.

If only left deep trees are considered, the best choice is the first plan in the above table, with cost 1,333.

Problem 8. (20 points) Consider the following sequence of log records:

.

Describe the action of the recovery manager, including changes to both disk and the log, if there is a crash and the last log record to appear on disk is:

(a) (b)

Answer:

(a) We find the record on the log. Transaction S is prior to the checkpoint, so S is correctly assume to have both completed and had its changes written do disk. Transaction U is committed, so we have to redo it. However, we do not need to look prior to the record, because we know that U's changes prior to the start of the checkpoint were flushed to disk during the checkpoint. Thus, D is set to 26 and B is set to 22. Neither transaction T nor transaction V is committed. Therefore, we have to undo them. Thus, B is set to 22, F is set to 15, E is set to 14, C is set to 12, and A is set to 20. Then, we write T and V records on the log. (b) The only difference is that we know that T is committed. Thus, we have to redo it. However, we do not need to look prior to the record, because we know that T's changes prior to the start of the checkpoint were flushed to disk during the checkpoint. We set C to 22, F to 30.