









Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Material Type: Exam; Class: Database Systems; Subject: Computer Science; University: University of Illinois - Urbana-Champaign; Term: Fall 2007;
Typology: Exams
1 / 17
This page cannot be seen from the preview
Don't miss anything!










Problem 1 (15 points) Basics For each of the following statements, indicate whether it is TRUE or FALSE by circling your choice. If you change your mind, cross out both responses and write “True” or “False.” You will get 1 point for each correct answer, 0 point for each incorrect answer.
(1) True False Logical and physical addresses are both representations for the database address.
(2) True False When a client requests a record that contains a BLOB (binary, large object), the database server that receives the request should return the entire record at a time.
(3) True False When an index covers many blocks, we may want to put a second-level index on the first-level index that contains pointers to records. When we use multiple levels of indexes, the first-level index must be sparse.
(4) True False Secondary indexes are always dense.
(5) True False In B-trees, we sometimes need to have overflow blocks.
(6) True False Dynamic hash tables support range queries.
(7) True False In linear hash tables, we sometimes have overflow blocks.
(8) True False One-pass algorithms for set union of two relations R and S requires M main memory buffers such that B(R) + B(S) ≤ M.
(9) True False In algebraic laws, σC (R − S) = R − σC (S) holds.
(10) True False Dynamic programming is a bottom-up method, where we consider only the best plan for each subexpression of the logical-query plan.
(11) True False In undo logging, it is not always possible to recover some consistent state of a database system if the system crashes during recovery.
(12) True False A scheduler based on the two-phase locking scheme always executes multiple transactions with some serial schedule.
(13) True False Two-phase locking prevents deadlocks.
Problem 2 (10 points) Pointer Swizzling Suppose that the important actions related to data storage take the following times, in some arbitrary time units:
(i) Suppose we design a pointer-swizzling control scheme like the following. At the beginning, we automatically swizzle 30% of the pointers and leave the rest unswizzled. Once a pointer is followed, we swizzle it by probability 0.5. If an unswizzled pointer has been followed twice, we swizzle it. Suppose there are 200 pointers in our data. The number of times that they are followed by a program is distributed according to the following histogram. times of being followed 0 1 2 3 number of pointers 20 80 60 40 What’s the expected cost of this program in terms of pointer following? (6 points)
(ii) A block in memory is said to be pinned if it cannot at the moment be written back to disk safely. Explain in what situation a block could be pinned because of swizzled pointers and describe how we should extend a translation table that maps database addresses to memory addresses in order to unpin such blocks. (4 points)
(c) Show the B+ tree that would result from deleting the data entry with key 91 from the original tree. (4 points)
Problem 4 (14 points) Query Execution
Consider two relations R(a, b) and S(b, c) with the following statistics:
T (R) = 10, 000, B(R) = 1, 000 (each block contains 10 tuples), V (R, b) = 200 (number of distinct values of attribute b in R),
T (S) = 6, 000, B(S) = 1, 500 (each block contains 4 tuples), V (S, b) = 100 (number of distinct values of attribute b in S), V (S, c) = 20 (number of distinct values of attribute c in S) and c > 100.
Also, we assume the number of available memory blocks is M = 101.
Please answer the following questions:
(i) Estimate the number of tuples in σc=150(S) (2 points)
(ii) Estimate the number of tuples in R ⋊⋉ σc> 25 (S). (2 points)
(iii) Suppose we have a B-tree index available for attribute b of S. The tree has 3 levels, and each node contains 4 keys– that is, there would be totally 100 keys in the 5 × 5 leaf nodes, where each key would correspond to a distinct value of b in S. Assume each node of the index occupies one block.
For simplicity, we assume the tuples with the same b value are stored consecutively in the disk but may spread in different blocks. Please estimate the worst cost of executing the following query: (the cost is measured by disk I/Os for accessing the index and the tuples). (3 points)
SELECT * FROM S WHERE b = 100 OR b = 1000
Problem 5 (17 points) Query Optimization
Suppose we want to compute the following:
τb(R(a, b) ⋊⋉ S(b, c) ⋊⋉ T (c, d)), where τb specifies sorting on b.
That is, we want to join three relations R, S, and T , and sort the results on attribute b. Let us make the following assumptions:
(i) Please draw two logical query plans (without concerning the physical access and join methods) that are not “left-deep.” (3 points)
(ii) Now, consider the following two logical query plans (regardless of what your answer is for (i)), where we also fill in some physical access and join methods. Note we show only the join part (and not sorting) of the query.
⋊⋉
?? ?? ?? ?? ??
???
???
??? ?? (table scan)
(index scan on b) (index scan on b)
(merge-join)
?? ?? ?? ?? ??
?? ?? ?? ?? ?? ( )
( ) (table scan)
Suppose we would like to avoid applying an additional sort on the final output relation, while still achieve the desired sorted result (i.e., τb). Please help fill appropriate access and join methods into the brackets in the above two query plans. (5 points)
Problem 6 (12 points) Recovery Consider a database with data items {A, B, C, D}. The system uses an undo/redo scheme and has the following logs. Note that an entry <T, X, old, new> means transaction T changes the value of X from old to new. We consider recovery using this undo/redo log.
(i) List all possible values of A, B, C and D. That is, what are the possible data values on the disk at the point of failure (after action 21)? (3 points)
(ii) During recovery, what are the transactions that need to be undone? (3 points)
(iii) During recovery, what are the transactions that need to be redone? (3 points)
(iv) What are the values of A, B, C, D after recovery? Explain why? (3 points)
(iv) In the following sequences of events, Ri(X) means that transaction Ti starts and its read set is the list of database elements X. Also, Vi means Ti attempts to validate and Wi(X) means that Ti finishes, and its write set was X. Tell what happens when the following sequence of events is processed by a validation-based scheduler. (3 points)
Problem 8 (10 points) Deadlocks Answer the following questions:
(i) For the sequence of actions below, assume that locks are requested immediately before each read and write action. However, if a transaction is already holding a lock on a database element, it does not need to obtain that lock again to execute another action on that element. Also, unlocks occur immediately after the final action that a transaction executes. Tell which locking actions are denied, and whether a deadlock occurs, by drawing a wait-for graph. ( points) r 1 (A); r 2 (B); w 1 (C); w 2 (D); r 3 (C); w 1 (B); w 4 (D); w 2 (A);
(ii) We observed in our study of lock-based schedules that there are several reasons why transac- tions that obtain locks could deadlock. Can a timestamp-based scheduler using the commit bit C(X) have a deadlock? If you beleive that the scheduler does not have any deadlock, prove that. Otherwise, describe a situation where a deadlock occurs. (Hint: consider the case where two transactions T 1 and T 2 run while accessing two database elements A and B.) ( points)