6 Solved Problems Final Exam - Database Systems | CS 411, Exams of Deductive Database Systems

Material Type: Exam; Class: Database Systems; Subject: Computer Science; University: University of Illinois - Urbana-Champaign; Term: Fall 2007;

Typology: Exams

Pre 2010

Uploaded on 03/16/2009

koofers-user-nm2
koofers-user-nm2 🇺🇸

2.5

(1)

10 documents

1 / 17

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
NetID:
CS411 Database Systems
Fall 2007
Department of Computer Science
University of Illinois at Urbana-Champaign
Final Examination
December 12, 2007
Time Limit: 180 minutes
Print your name and NetID below. In addition, print your NetID in the upper right
corner of every page.
Name: NetID:
Including this cover page, this exam booklet contains 17 pages. Check if you have
missing pages.
The exam is closed book and closed notes. You are allowed to use scratch papers.
No calculators or other electronic devices are permitted. Any form of cheating on the
examination will result in a zero grade.
Please write your solutions in the spaces provided on the exam. You may use the blank
areas and backs of the exam pages for scratch work.
Please make your answers clear and succinct; you will lose credit for verbose, convo-
luted, or confusing answers. Simplicity does count!
Each problem has different weight, as listed below– So, plan your time accordingly.
You should look through the entire exam before getting started, to plan your strategy.
Problem 1 2 3 4 5 6 7 8 Total
Points 15 10 10 14 17 12 12 10 100
Score
Grader
1
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff

Partial preview of the text

Download 6 Solved Problems Final Exam - Database Systems | CS 411 and more Exams Deductive Database Systems in PDF only on Docsity!

CS411 Database Systems

Fall 2007

Department of Computer Science

University of Illinois at Urbana-Champaign

Final Examination

December 12, 2007

Time Limit: 180 minutes

  • Print your name and NetID below. In addition, print your NetID in the upper right

corner of every page.

Name: NetID:

  • Including this cover page, this exam booklet contains 17 pages. Check if you have

missing pages.

  • The exam is closed book and closed notes. You are allowed to use scratch papers.

No calculators or other electronic devices are permitted. Any form of cheating on the

examination will result in a zero grade.

  • Please write your solutions in the spaces provided on the exam. You may use the blank

areas and backs of the exam pages for scratch work.

  • Please make your answers clear and succinct; you will lose credit for verbose, convo-

luted, or confusing answers. Simplicity does count!

  • Each problem has different weight, as listed below– So, plan your time accordingly.

You should look through the entire exam before getting started, to plan your strategy.

Problem 1 2 3 4 5 6 7 8 Total

Points 15 10 10 14 17 12 12 10 100

Score

Grader

Problem 1 (15 points) Basics For each of the following statements, indicate whether it is TRUE or FALSE by circling your choice. If you change your mind, cross out both responses and write “True” or “False.” You will get 1 point for each correct answer, 0 point for each incorrect answer.

(1) True False Logical and physical addresses are both representations for the database address.

(2) True False When a client requests a record that contains a BLOB (binary, large object), the database server that receives the request should return the entire record at a time.

(3) True False When an index covers many blocks, we may want to put a second-level index on the first-level index that contains pointers to records. When we use multiple levels of indexes, the first-level index must be sparse.

(4) True False Secondary indexes are always dense.

(5) True False In B-trees, we sometimes need to have overflow blocks.

(6) True False Dynamic hash tables support range queries.

(7) True False In linear hash tables, we sometimes have overflow blocks.

(8) True False One-pass algorithms for set union of two relations R and S requires M main memory buffers such that B(R) + B(S) ≤ M.

(9) True False In algebraic laws, σC (R − S) = R − σC (S) holds.

(10) True False Dynamic programming is a bottom-up method, where we consider only the best plan for each subexpression of the logical-query plan.

(11) True False In undo logging, it is not always possible to recover some consistent state of a database system if the system crashes during recovery.

(12) True False A scheduler based on the two-phase locking scheme always executes multiple transactions with some serial schedule.

(13) True False Two-phase locking prevents deadlocks.

Problem 2 (10 points) Pointer Swizzling Suppose that the important actions related to data storage take the following times, in some arbitrary time units:

  • On-demand swizzling of a pointer: 30;
  • Automatic swizzling of pointers: 20 per pointer;
  • Following a swizzled pointer: 1;
  • Following an unswizzled pointer: 10.

(i) Suppose we design a pointer-swizzling control scheme like the following. At the beginning, we automatically swizzle 30% of the pointers and leave the rest unswizzled. Once a pointer is followed, we swizzle it by probability 0.5. If an unswizzled pointer has been followed twice, we swizzle it. Suppose there are 200 pointers in our data. The number of times that they are followed by a program is distributed according to the following histogram. times of being followed 0 1 2 3 number of pointers 20 80 60 40 What’s the expected cost of this program in terms of pointer following? (6 points)

(ii) A block in memory is said to be pinned if it cannot at the moment be written back to disk safely. Explain in what situation a block could be pinned because of swizzled pointers and describe how we should extend a translation table that maps database addresses to memory addresses in order to unpin such blocks. (4 points)

(c) Show the B+ tree that would result from deleting the data entry with key 91 from the original tree. (4 points)

Problem 4 (14 points) Query Execution

Consider two relations R(a, b) and S(b, c) with the following statistics:

T (R) = 10, 000, B(R) = 1, 000 (each block contains 10 tuples), V (R, b) = 200 (number of distinct values of attribute b in R),

T (S) = 6, 000, B(S) = 1, 500 (each block contains 4 tuples), V (S, b) = 100 (number of distinct values of attribute b in S), V (S, c) = 20 (number of distinct values of attribute c in S) and c > 100.

Also, we assume the number of available memory blocks is M = 101.

Please answer the following questions:

(i) Estimate the number of tuples in σc=150(S) (2 points)

(ii) Estimate the number of tuples in R ⋊⋉ σc> 25 (S). (2 points)

(iii) Suppose we have a B-tree index available for attribute b of S. The tree has 3 levels, and each node contains 4 keys– that is, there would be totally 100 keys in the 5 × 5 leaf nodes, where each key would correspond to a distinct value of b in S. Assume each node of the index occupies one block.

For simplicity, we assume the tuples with the same b value are stored consecutively in the disk but may spread in different blocks. Please estimate the worst cost of executing the following query: (the cost is measured by disk I/Os for accessing the index and the tuples). (3 points)

SELECT * FROM S WHERE b = 100 OR b = 1000

Problem 5 (17 points) Query Optimization

Suppose we want to compute the following:

τb(R(a, b) ⋊⋉ S(b, c) ⋊⋉ T (c, d)), where τb specifies sorting on b.

That is, we want to join three relations R, S, and T , and sort the results on attribute b. Let us make the following assumptions:

  • First, for accessing each relation:
    • R can be index scanned on attribute a or b, or table scanned.
    • S can be index scanned only on b, or table scanned.
    • T can only be table scanned. We assume the index is based on B-tree. Accessing a relation using index scan on an attribute will produce records sorted by that attribute, while table scan does not give any guaranteed output order for the records.
  • Second, for joining two relations, we have the following two choices:
    • merge-join can be used if the two relations are already sorted on the join attribute. It will simply join the two sorted relations using the merge part of the simple sort-based join algorithm, and the output would be sorted.
    • nested-loop join can be used in any case. For simplicity, we assume the block-based nested-loop join is used and the relation produced in the left sub-tree of a query plan will be used as the outer loop.

(i) Please draw two logical query plans (without concerning the physical access and join methods) that are not “left-deep.” (3 points)

(ii) Now, consider the following two logical query plans (regardless of what your answer is for (i)), where we also fill in some physical access and join methods. Note we show only the join part (and not sorting) of the query.

⋊⋉

R S

T









?? ?? ?? ?? ??

     ???

???

??? ?? (table scan)

(index scan on b) (index scan on b)

(merge-join)

S T

R



 

 

?? ?? ?? ?? ??

 

 



?? ?? ?? ?? ?? ( )

( ) (table scan)

Suppose we would like to avoid applying an additional sort on the final output relation, while still achieve the desired sorted result (i.e., τb). Please help fill appropriate access and join methods into the brackets in the above two query plans. (5 points)

Problem 6 (12 points) Recovery Consider a database with data items {A, B, C, D}. The system uses an undo/redo scheme and has the following logs. Note that an entry <T, X, old, new> means transaction T changes the value of X from old to new. We consider recovery using this undo/redo log.

  1. <T2, B, 10, 11>
  2. <T1, A, 20, 21>
  3. <Checkpoint start; Active= {T1}>
  4. <T3, C, 30, 31>
  5. <T4, B, 40, 41>
  6. <T3, D, 50, 51>
  7. <Checkpoint start; Active= {T1, T3}>
  8. <T1, C, 31, 32>
  9. <T5, D, 51, 52>
  10. <T6, C, 32, 33>
  11. System failed

(i) List all possible values of A, B, C and D. That is, what are the possible data values on the disk at the point of failure (after action 21)? (3 points)

(ii) During recovery, what are the transactions that need to be undone? (3 points)

(iii) During recovery, what are the transactions that need to be redone? (3 points)

(iv) What are the values of A, B, C, D after recovery? Explain why? (3 points)

(iv) In the following sequences of events, Ri(X) means that transaction Ti starts and its read set is the list of database elements X. Also, Vi means Ti attempts to validate and Wi(X) means that Ti finishes, and its write set was X. Tell what happens when the following sequence of events is processed by a validation-based scheduler. (3 points)

R 1 (A, B); R 2 (B, C); V 1 ; R 3 (C, D); V 3 ; W 1 (A); V 2 ; W 2 (A); W 3 (D)

Problem 8 (10 points) Deadlocks Answer the following questions:

(i) For the sequence of actions below, assume that locks are requested immediately before each read and write action. However, if a transaction is already holding a lock on a database element, it does not need to obtain that lock again to execute another action on that element. Also, unlocks occur immediately after the final action that a transaction executes. Tell which locking actions are denied, and whether a deadlock occurs, by drawing a wait-for graph. ( points) r 1 (A); r 2 (B); w 1 (C); w 2 (D); r 3 (C); w 1 (B); w 4 (D); w 2 (A);

(ii) We observed in our study of lock-based schedules that there are several reasons why transac- tions that obtain locks could deadlock. Can a timestamp-based scheduler using the commit bit C(X) have a deadlock? If you beleive that the scheduler does not have any deadlock, prove that. Otherwise, describe a situation where a deadlock occurs. (Hint: consider the case where two transactions T 1 and T 2 run while accessing two database elements A and B.) ( points)