Final Exam Fall 2004 - Database Systems | CS 411, Exams of Deductive Database Systems

Material Type: Exam; Class: Database Systems; Subject: Computer Science; University: University of Illinois - Urbana-Champaign; Term: Fall 2004;

Typology: Exams

Pre 2010

Uploaded on 03/16/2009

koofers-user-kmu-1
koofers-user-kmu-1 🇺🇸

10 documents

1 / 11

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
NetID:
CS411 Database Systems
Fall 2004, Prof. Chang
Department of Computer Science
University of Illinois at Urbana-Champaign
Final Examination
December 17, 2004
Time Limit: 180 minutes
Print your name and NetID below. In addition, print your NetID in the upper right
corner of every page.
Name: NetID:
Including this cover page, this exam booklet contains 11 pages. Check if you have
missing pages.
The exam is closed book and closed notes. You are allowed to use scratch papers.
No calculators or other electronic devices are permitted. Any form of cheating on the
examination will result in a zero grade.
Please write your solutions in the spaces provided on the exam. You may use the blank
areas and backs of the exam pages for scratch work.
Please make your answers clear and succinct; you will lose credit for verbose, convo-
luted, or confusing answers. Simplicity does count!
Each problem has different weight, as listed below– So, plan your time accordingly.
You should look through the entire exam before getting started, to plan your strategy.
Problem 1 2 3 4 5 6 7 8 Total
Points 12 18 8 9 10 10 19 14 100
Score
Grader
1
pf3
pf4
pf5
pf8
pf9
pfa

Partial preview of the text

Download Final Exam Fall 2004 - Database Systems | CS 411 and more Exams Deductive Database Systems in PDF only on Docsity!

CS411 Database Systems

Fall 2004, Prof. Chang

Department of Computer Science

University of Illinois at Urbana-Champaign

Final Examination

December 17, 2004

Time Limit: 180 minutes

  • Print your name and NetID below. In addition, print your NetID in the upper right

corner of every page.

Name: NetID:

  • Including this cover page, this exam booklet contains 11 pages. Check if you have

missing pages.

  • The exam is closed book and closed notes. You are allowed to use scratch papers.

No calculators or other electronic devices are permitted. Any form of cheating on the

examination will result in a zero grade.

  • Please write your solutions in the spaces provided on the exam. You may use the blank

areas and backs of the exam pages for scratch work.

  • Please make your answers clear and succinct; you will lose credit for verbose, convo-

luted, or confusing answers. Simplicity does count!

  • Each problem has different weight, as listed below– So, plan your time accordingly.

You should look through the entire exam before getting started, to plan your strategy.

Problem 1 2 3 4 5 6 7 8 Total

Points 12 18 8 9 10 10 19 14 100

Score

Grader

Problem 1 (12 points) Misc. Concepts

For each of the following statements, indicate whether it is TRUE or FALSE by circling your choice. You will get 1 point for each correct answer, -0.5 point for each incorrect answer, and 0 point for each answer left blank.

(1) T rue F alse The LRU buffer replacement algorithm should not be used for certain query operations, such as nested-loop join.

(2) T rue F alse For relation R(A, B, C, D), if AB → C and C → D, then {A, B} is a key.

(3) T rue F alse To process join operation R on S, we can choose any join methods: nested-loop, index, sort- merge, hash, or hybrid– The only difference is their costs.

(4) T rue F alse Given an SQL query, there are often multiple ways of writing it in relational algebra.

(5) T rue F alse For the “ACID” properties of transactions, the “I” stands for idempotency– that is, the multiple executions of the same transaction should result in the same correct effect.

(6) T rue F alse If a schema is in 3NF, then it must also be in BCNF.

(7) T rue F alse B-Tree was invented by Dr. Rudolf Bayer, an alumnus of the computer science department of UIUC.

(8) T rue F alse If a pointer is very likely to be followed for many times, then it will pay off to perform automatic swizzling.

(9) T rue F alse An RDBMS performs its own buffer management, for not only efficiency but also correctness– because it needs to impose certain ordering in buffer replacement for transaction processing.

(10) T rue F alse For disk latency, seek time refers to the time for the head to find the desired sector– i.e., for the disk to rotate so the first of the sectors containing the desired data reaches the head.

(11) T rue F alse From A → B, we can derive AC → BC, which further leads to A → BC.

(12) T rue F alse For building a database application, we often write the application mainly in some “host language,” which interacts with a database for manipulating data. The different ways of operations between the host language and the underlying database, as many have observed, are called impedance mismatch.

Problem 3 (8 points) Schema Decomposition

We perform decomposition to normalize an original schema to be of certain normal forms. For such a decomposition to be “equivalent” to the original schema, it is desirable to be lossless.

To study this concept, let’s consider an original schema R(A, B, C). Suppose we decompose R into R1 (A, B) and R2 (A, C).

(a) Is this decomposition always lossless? Answer yes or no and briefly explain why. (4 points)

(b) Give an example instance of R (i.e., an example table with several tuples) and demonstrate its decomposition, to support your answer in (a). (4 points)

Problem 4 (9 points) Query Languages

Consider relation Scores(name, exam, score), which records the score of a student in an examination (either “midterm” or “final”); for example:

name exam score Betty midterm 85 Alex midterm 57 Alex final 90 Betty final 78 ... ... ...

(a) Write a query, in relational algebra, to return the final-exam score of Alex. (2 points)

(b) Write a query, in relational algebra, to return the names of those students who score higher in the final exam than in the midterm exam. (3 points)

(b) Write a query, in SQL, to return the “count” distribution of scores for the midterm exam, in descending order of score. That is, we want to list each score along with the number of students with that score, in the midterm exam. For example, the output may look like the following: (4 points) score count 85 6 82 5 78 8 ... ...

Problem 6 (10 points) Query Processing

We wish to join relations R(a, b), S (b, c), and T (c, d), i.e., to produce R onS onT. As our assump- tions:

  • Each relation holds Br, Bs, Bt blocks of data respectively.
  • The memory buffer for query processing has M blocks.
  • R is already sorted by R.b.

(a) If we want to minimize memory requirement for processing this query, what is the minimal value of M? Describe a processing strategy that results in this minimal requirement. ( 4 points) Note: In counting the memory requirement, as usual, we do not include the buffer space for writing the final output, i.e., tuples of RonSonT. All other required space should be included.

(b) To contrast, suppose we want to process with the following strategy. What is the memory requirement M? That is, you will derive an inequality condition in terms of M , Br, Bs, and Bt, under which the following procedure can be carried out. (6 points)

  1. Perform the first phase of two-phase multiway merge sort on S. That is, as many times as necessary, we load the buffer with as many blocks from S as possible, sort the tuples in memory, and write out the sorted sublist. At the end of this step, S is stored as several sorted sublist (or “runs”) on the disk. (We do not perform the second phase now.)
  2. Read T entirely into the buffer, using as many blocks as necessary.
  3. Merge R and the sorted sublists of S to produce R onS, and compare each of the resulting tuple with T (already in memory) to produce (R onS) onT. (As usual, any output tuple of the overall result is stored in an output buffer, which is not part of M .)

Problem 7 (19 points) Query Optimization

Consider the following query that joins Student(sid, sname, sdept), Enrollment(sid, cid), Courses(cid, ctitle, iid), Instructor (iid, iname, iaddr).

select sname, ctitle, iname from Student S, Enrollment E, Course C, Instructor I where join-conditions AND selection-conditions

In the where-clause, we have the following conditions:

  • The join-conditions specify how the relations are joined. In this query, they are fixed to natural joins, i.e.: S.sid = E.sid AND E.cid = C.cid AND C.iid = I.iid.
  • The selection-conditions are of the form c 1 AND c 2 AND · · · AND cn, where each ci is a selection condition on some relation, e.g., S.sdept = “CS” or I.iaddr=”1234SC”.

In this question, we are to consider that, although the join conditions remain the same, different scenarios with different selection conditions may need different join ordering. For our purpose, we consider the following join orders for processing SonEonConI:

  • J1 : ((S onE) onC) onI
  • J2 : ((I onC) onE) onS
  • J3 : (S onE) on(C onI)

(a) We have only given three example orders in the above. If we want to consider all possibilities, how many different orders are there for processing SonEonConI? (6 points) Note, to simplify, let’s assume that join orders are symmetric– i.e., A on B is equivalent to B on A. For instance, we consider J1 and I on ((S onE) onC) as the same order.

Problem 8 (14 points) Failure Recovery

Consider the following UNDO logging.

Action ID Action 1 〈START T 1 〉

2 〈T 1, A, 10〉

3 〈START T 2 〉

4 〈T 1, B, 10〉

5 〈COMMIT T 1 〉

6 〈T 2, B, 10〉

7 〈COMMIT T 2 〉

8 〈START T 3 〉

9 〈T 3, A, 10〉

10 〈START T 4 〉

11 〈T 3, B, 20〉

12 〈COMMIT T 3 〉

13 〈T 4, C, 10〉

14 〈START T 5 〉

15 〈COMMIT T 4 〉

16 〈T 5, D, 10〉

17 〈COMMIT T 5 〉

(a) We want to see when “dirty data” can be flushed to disk– i.e., what time to perform Output(X) for data X (e.g., Output(A), Output(B), etc.). Suppose we want to perform such “output” as late as possible. Insert these outputs on the figure to suggest their timing. (3 points)

(b) Suppose we want to start checkpointing right after Action 4: First, show on the figure this start checkpointing log record. (2 points) Then, show on the figure the end checkpointing log record. (3 points)

(c) Continue from (b). Suppose the system crashes after Action 15. How far back in the log must we look to find all actions that need to be undone? (3 points)

(d) Now, suppose this system is actually a redo log. To contrast with (a), if you still want to perform “output” (e.g., Output(A), Output(B), etc.) as late as possible. When should such output be done? (3 points) You can show on the figure for the timing, but you should separate and distinguish your answer from that of (a).