






















Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Material Type: Exam; Professor: Chang; Class: Database Systems; Subject: Computer Science; University: University of Illinois - Urbana-Champaign; Term: Spring 2010;
Typology: Exams
1 / 30
This page cannot be seen from the preview
Don't miss anything!























Problem 1 (22 points) Misc. Concepts
For each of the following statements, indicate whether it is TRUE or FALSE by circling your choice, and provide an explanation to justify. You will get 2 points for each correct answer with correct explanations, and no penalty (of negative points) for wrong answers.
(1) An E-R diagram will translate uniquely to a relational schema.
(2) In an E-R diagram, an entity will translate to a table in the relation model, while a relationship will translate to a join between tables.
(3) We can consider relational algebra as a query language.
(4) The basic operators in relational algebra are π, σ, ρ, on, ∪, −.
(5) In SQL, we can only use aggregate functions where there is a Group-By clause.
(6) In rule-based optimization, a commonly used heuristic rule is to push projection down, to reduce the number of columns early on.
(7) In optimizing join queries, we choose to assume only left-deep join trees, because such trees are most efficient.
(8) When we program a database transaction, we can use the “Abort” command to rollback a transaction that cannot be successfully completed.
(9) Grouping (i.e., group-by) must be processed in a two-pass algorithm.
(10) For logging, we prefer the UNDO+REDO scheme, so that we can be more flexible in when to write out dirty data to disk.
(11) One particular motivation for uncertain database research is that data may be inherently uncertain, e.g., weather forecast gives probabilistic predictions.
(Problem 2, cont.)
(4) For the above tables, what is the result of this query: σA< 2 (RonS)? Show the result table.
(5) Name one difference of query semantics in an uncertain database as compared to deterministic SQL queries.
(6) Give one advantage of UNDO-only logging over REDO-only logging.
(7) Given a memory space of 1000 blocks, for hashing a relation of 10^6 blocks, what is the minimal size (as the number of blocks) of a bucket?
(8) In tuning system performance, we should use Hints sparingly. Why?
Problem 3 (11 points) Query Languages
The following questions refer to the database schema below:
Professor (pid, pname, dept), Student(sid, sname, age), Course(cid, pid, cname, time, location), Grade(cid, sid, score).
(a) Write a query, in relational algebra, to return the names of professors who work in “CS” department and teach at least two courses. (4 points)
(b) Write an SQL query, to return the name, and the average score of each student, ordered by the average score (descending) (Note: If one student is registered for two courses and the scores are 75 and 85 , respectively, his average score is 80 ). (4 points)
(c) The professor of CS411 would prefer to enhance the score of young students (younger than 20). Write a statement that scales these students’ score by 1.07 (3 points)
(b) Show the resulting tree after deleting key 39 from the original tree. (4 points)
(c) Show the resulting tree after inserting key 24 into the original tree. (4 points)
Problem 5 (12 points) Query Processing
Consider two relations R(x, y) and S (x, z) with the following statistics: T (R) = 5,000, B (R) = 500 (each block contains 10 tuples), V (R, x) = 100 (number of distinct values of attribute x in R), T (S) = 8,000, B (S) = 1,000 (each block contains 8 tuples), V (S, x) = 200 (number of distinct values of attribute x in S), V (S, z) = 50 (number of distinct values of attribute z in S) and z < 100. Assume the memory buffer has 101 blocks (M = 101)
(a) Estimate the number of tuples in R ./z< 50 S (3 points)
(b) Briefly describe the algorithm and compute the cost of join using a block nested-loop join. ( 5 points)
Problem 6 (15 points) Query Optimization
(a) Judge the following two rules to be true/false, if true, give a short proof. If false, give a counterexample. (6 points)
(a.1) Projection can be pushed below set union. (3 points)
(a.2) Duplicate elimination can be pushed below projection. (3 points)
(b) Consider the physical query plans for the following expression: (R(w, x) ./ S(x, y)) ./ U (y, z) We have the following assumptions:
(1) B(R) = 5, 000, B(S) = B(U ) = 12, 000 (2) The intermediate result R ./ S occupies k blocks for some k. (3) Both joins will be implemented as hash-joins, either one-pass or two-pass, depending on k. (4) The memory buffer has 101 blocks (M = 101)
Consider three cases regarding the range of k and compute the cost of the best physical query plan for each case and fill in the blanks in the table on the next page. When you choose the two-pass algorithm for the final join, make sure to specify the number of buckets in the fields of the third column as well. (9 points)
Problem 7 (14 points) Failure Recovery
Consider the following log sequence.
Log ID Log 1 〈START T 1 〉
2 〈T 1, A, 10〉
3 〈START T 2 〉
4 〈T 2, B, 15〉
5 〈T 2, C, 20〉
6 〈START T 3 〉
7 〈T 1, B, 25〉
8 〈COMMIT T 1 〉
9 〈T 3, C, 30〉
10 〈COMMIT T 2 〉
11 〈START T 4 〉
12 〈T 3, D, 35〉
13 〈T 4, A, 40〉
14 〈COMMIT T 3 〉
15 〈T 4, C, 45〉
16 〈COMMIT T 4 〉
17 〈START T 5 〉
Note: For the questions (e)-(g), assume the given log sequence is a REDO log.
(e) Please briefly explain the meaning of the record: 〈T 3, D, 35〉 (logID 12) (1 points)
(f) When is the earliest time for transaction T 3, T 4 that “dirty data” can be flushed onto disk (i.e. the time Output(X) for data X can be performed)? (2 points)
For T 3: Output(s) before logID For T 4: Output(s) before logID
(g) Suppose we set the checkpointing in the following way (the original IDs do not change):
Log ID Log 1 〈START T 1 〉
2 〈T 1, A, 10〉
3 〈START T 2 〉
4 〈T 2, B, 15〉
5 〈T 2, C, 20〉
6 〈START T 3 〉
7 〈T 1, B, 25〉
8 〈COMMIT T 1 〉
9 〈T 3, C, 30〉
−→ 〈START CKPT, T 2, T 3 〉
10 〈COMMIT T 2 〉
11 〈START T 4 〉
12 〈T 3, D, 35〉
13 〈T 4, A, 40〉
−→ 〈END CKPT〉
14 〈COMMIT T 3 〉
15 〈T 4, C, 45〉
16 〈COMMIT T 4 〉
17 〈START T 5 〉 Now the system crashes right after logID 14 (the end checkpoint has been written out to disk). Show which transactions/actions (e.g.: 〈T 1, A, 10〉) need to be redone in the correct order. (3 points)
Problem 1 (22 points) Misc. Concepts
For each of the following statements, indicate whether it is TRUE or FALSE by circling your choice, and provide an explanation to justify. You will get 2 points for each correct answer with correct explanations, and no penalty (of negative points) for wrong answers.
(1) F alse An E-R diagram will translate uniquely to a relational schema.
⇒ Explain: It can be translated into different relational schemas, depending on different merging strategies, as well as different subclassing methods.
(2) F alse In an E-R diagram, an entity will translate to a table in the relation model, while a relationship will translate to a join between tables.
⇒ Explain: A relationship in ER diagram normally would also translate into a table in the relation model. For many-to-one relationships, they can be absorbed into the table for entity.
(3) T rue We can consider relational algebra as a query language.
⇒ Explain: It is a query language, although it is not declarative (SQL is declarative).
(4) F alse The basic operators in relational algebra are π, σ, ρ, on, ∪, −.
⇒ Explain: onis not a basic operator. Instead, Cartesian product × is a basic operator.
(5) F alse In SQL, we can only use aggregate functions where there is a Group-By clause.
⇒ Explain: We can use aggregate directly in Select clause without a Group-By clause.
(6) T rue In rule-based optimization, a commonly used heuristic rule is to push projection down, to reduce the number of columns early on.
⇒ Explain: It is true. Another heuristic is to push selection down to reduce the number of tuples.
(7) F alse
In optimizing join queries, we choose to assume only left-deep join trees, because such trees are most efficient.
⇒ Explain: Not necessarily. Bush trees can sometimes be more efficient. We use left-deep only for the easiness of plan selection.
(8) T rue When we program a database transaction, we can use the “Abort” command to rollback a transaction that cannot be successfully completed.
⇒ Explain: True. “Abort” would rollback the operations to make it as if the transaction did not take place.
(9) F alse Grouping (i.e., group-by) must be processed in a two-pass algorithm.
⇒ Explain: False. It could be processed in one-pass algorithm as well, if there is enough resource.
(10) T rue
For logging, we prefer the UNDO+REDO scheme, so that we can be more flexible in when to write out dirty data to disk.
⇒ Explain: True. UNDO plus REDO gives us more flexibility.
(11) T rue
One particular motivation for uncertain database research is that data may be inherently uncertain, e.g., weather forecast gives probabilistic predictions.
⇒ Explain: True. This reflects the real world, where many things are uncertain.
(Problem 2, cont.)
(4) Answer: There are two tuples < 1 , 3 , 2 , 2 , 2 > and < 1 , 3 , 2 , 4 , 1 > for schema (A,B,C,D,E).
For the above tables, what is the result of this query: σA< 2 (RonS)? Show the result table.
(5) Answer: There are many differences. For instance, uncertain database adopts the possible worlds semantics, where a probabilistic database is viewed as a set of possible instances (worlds) associated with their probabilities. Name one difference of query semantics in an uncertain database as compared to deterministic SQL queries.
(6) Answer: In UNDO-only logging, data is flushed immediately after a transaction finished, thus it requires less buffer to keep the data. For more details on comparing the two logging schemes, refer to page 869 of the textbook. Give one advantage of UNDO-only logging over REDO-only logging.
(7) Answer: 10^6 /1000 = 1000
Given a memory space of 1000 blocks, for hashing a relation of 10^6 blocks, what is the minimal size (as the number of blocks) of a bucket?
(8) Answer: Since the database may change and evolve, and in such situations Hints may do harm to the system performance. In tuning system performance, we should use Hints sparingly. Why?
Problem 3 (11 points) Query Languages
The following questions refer to the database schema below:
Professor (pid, pname, dept), Student(sid, sname, age), Course(cid, pid, cname, time, location), Grade(cid, sid, score).
(a) Write a query, in relational algebra, to return the names of professors who work in “CS” department and teach at least two courses. (4 points)
Solution:
(b) Write an SQL query, to return the name, and the average score of each student, ordered by the average score (descending) (Note: If one student is registered for two courses and the scores are 75 and 85 , respectively, his average score is 80 ). (4 points)
Solution: select sname, AVG(score) from Student, Result where Student.sid = Result.sid group by sname order by AVG(score) DESC
(c) The professor of CS411 would prefer to enhance the score of young students (younger than 20). Write a statement that scales these students’ score by 1.07 (3 points)
Solution: update Student, Result set score = score ∗ 1. 07 where Student.sid = Result.sid AND Student.age < 20 AND Result.cid = ‘CS411’