






Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Material Type: Exam; Class: Database Systems; Subject: Computer Science; University: University of Illinois - Urbana-Champaign; Term: Fall 2004;
Typology: Exams
1 / 11
This page cannot be seen from the preview
Don't miss anything!







Problem 1 (12 points) Misc. Concepts
For each of the following statements, indicate whether it is TRUE or FALSE by circling your choice. You will get 1 point for each correct answer, -0.5 point for each incorrect answer, and 0 point for each answer left blank.
(1) T rue F alse The LRU buffer replacement algorithm should not be used for certain query operations, such as nested-loop join.
(2) T rue F alse For relation R(A, B, C, D), if AB → C and C → D, then {A, B} is a key.
(3) T rue F alse To process join operation R on S, we can choose any join methods: nested-loop, index, sort- merge, hash, or hybrid– The only difference is their costs.
(4) T rue F alse Given an SQL query, there are often multiple ways of writing it in relational algebra.
(5) T rue F alse For the “ACID” properties of transactions, the “I” stands for idempotency– that is, the multiple executions of the same transaction should result in the same correct effect.
(6) T rue F alse If a schema is in 3NF, then it must also be in BCNF.
(7) T rue F alse B-Tree was invented by Dr. Rudolf Bayer, an alumnus of the computer science department of UIUC.
(8) T rue F alse If a pointer is very likely to be followed for many times, then it will pay off to perform automatic swizzling.
(9) T rue F alse An RDBMS performs its own buffer management, for not only efficiency but also correctness– because it needs to impose certain ordering in buffer replacement for transaction processing.
(10) T rue F alse For disk latency, seek time refers to the time for the head to find the desired sector– i.e., for the disk to rotate so the first of the sectors containing the desired data reaches the head.
(11) T rue F alse From A → B, we can derive AC → BC, which further leads to A → BC.
(12) T rue F alse For building a database application, we often write the application mainly in some “host language,” which interacts with a database for manipulating data. The different ways of operations between the host language and the underlying database, as many have observed, are called impedance mismatch.
Problem 3 (8 points) Schema Decomposition
We perform decomposition to normalize an original schema to be of certain normal forms. For such a decomposition to be “equivalent” to the original schema, it is desirable to be lossless.
To study this concept, let’s consider an original schema R(A, B, C). Suppose we decompose R into R1 (A, B) and R2 (A, C).
(a) Is this decomposition always lossless? Answer yes or no and briefly explain why. (4 points)
(b) Give an example instance of R (i.e., an example table with several tuples) and demonstrate its decomposition, to support your answer in (a). (4 points)
Problem 4 (9 points) Query Languages
Consider relation Scores(name, exam, score), which records the score of a student in an examination (either “midterm” or “final”); for example:
name exam score Betty midterm 85 Alex midterm 57 Alex final 90 Betty final 78 ... ... ...
(a) Write a query, in relational algebra, to return the final-exam score of Alex. (2 points)
(b) Write a query, in relational algebra, to return the names of those students who score higher in the final exam than in the midterm exam. (3 points)
(b) Write a query, in SQL, to return the “count” distribution of scores for the midterm exam, in descending order of score. That is, we want to list each score along with the number of students with that score, in the midterm exam. For example, the output may look like the following: (4 points) score count 85 6 82 5 78 8 ... ...
Problem 6 (10 points) Query Processing
We wish to join relations R(a, b), S (b, c), and T (c, d), i.e., to produce R onS onT. As our assump- tions:
(a) If we want to minimize memory requirement for processing this query, what is the minimal value of M? Describe a processing strategy that results in this minimal requirement. ( 4 points) Note: In counting the memory requirement, as usual, we do not include the buffer space for writing the final output, i.e., tuples of RonSonT. All other required space should be included.
(b) To contrast, suppose we want to process with the following strategy. What is the memory requirement M? That is, you will derive an inequality condition in terms of M , Br, Bs, and Bt, under which the following procedure can be carried out. (6 points)
Problem 7 (19 points) Query Optimization
Consider the following query that joins Student(sid, sname, sdept), Enrollment(sid, cid), Courses(cid, ctitle, iid), Instructor (iid, iname, iaddr).
select sname, ctitle, iname from Student S, Enrollment E, Course C, Instructor I where join-conditions AND selection-conditions
In the where-clause, we have the following conditions:
In this question, we are to consider that, although the join conditions remain the same, different scenarios with different selection conditions may need different join ordering. For our purpose, we consider the following join orders for processing SonEonConI:
(a) We have only given three example orders in the above. If we want to consider all possibilities, how many different orders are there for processing SonEonConI? (6 points) Note, to simplify, let’s assume that join orders are symmetric– i.e., A on B is equivalent to B on A. For instance, we consider J1 and I on ((S onE) onC) as the same order.
Problem 8 (14 points) Failure Recovery
Consider the following UNDO logging.
Action ID Action 1 〈START T 1 〉
2 〈T 1, A, 10〉
3 〈START T 2 〉
4 〈T 1, B, 10〉
5 〈COMMIT T 1 〉
6 〈T 2, B, 10〉
7 〈COMMIT T 2 〉
8 〈START T 3 〉
9 〈T 3, A, 10〉
10 〈START T 4 〉
11 〈T 3, B, 20〉
12 〈COMMIT T 3 〉
13 〈T 4, C, 10〉
14 〈START T 5 〉
15 〈COMMIT T 4 〉
16 〈T 5, D, 10〉
17 〈COMMIT T 5 〉
(a) We want to see when “dirty data” can be flushed to disk– i.e., what time to perform Output(X) for data X (e.g., Output(A), Output(B), etc.). Suppose we want to perform such “output” as late as possible. Insert these outputs on the figure to suggest their timing. (3 points)
(b) Suppose we want to start checkpointing right after Action 4: First, show on the figure this start checkpointing log record. (2 points) Then, show on the figure the end checkpointing log record. (3 points)
(c) Continue from (b). Suppose the system crashes after Action 15. How far back in the log must we look to find all actions that need to be undone? (3 points)
(d) Now, suppose this system is actually a redo log. To contrast with (a), if you still want to perform “output” (e.g., Output(A), Output(B), etc.) as late as possible. When should such output be done? (3 points) You can show on the figure for the timing, but you should separate and distinguish your answer from that of (a).