Final Examination Problems - Database Systems | CS 411, Exams of Deductive Database Systems

Material Type: Exam; Professor: Chang; Class: Database Systems; Subject: Computer Science; University: University of Illinois - Urbana-Champaign; Term: Spring 2010;

Typology: Exams

2010/2011

Uploaded on 06/14/2011

koofers-user-zpe-1
koofers-user-zpe-1 🇺🇸

3

(2)

10 documents

1 / 30

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
NetID:
CS411 Database Systems
Spring 2010, Prof. Chang
Department of Computer Science
University of Illinois at Urbana-Champaign
Final Examination
May 14, 2010
Time Limit: 180 minutes
Print your name and NetID below. In addition, print your NetID in the upper right
corner of every page.
Name: NetID:
Including this cover page, this exam booklet contains 16 pages. Check if you have
missing pages.
The exam is closed book and closed notes. You are allowed to use scratch papers.
No calculators or other electronic devices are permitted. Any form of cheating on the
examination will result in a zero grade.
Please write your solutions in the spaces provided on the exam. You may use the blank
areas and backs of the exam pages for scratch work.
Please make your answers clear and succinct; you will lose credit for verbose, convo-
luted, or confusing answers. Simplicity does count!
Each problem has different weight, as listed below– So, plan your time accordingly.
You should look through the entire exam before getting started, to plan your strategy.
Problem 1 2 3 4 5 6 7 Total
Points 22 16 11 10 12 15 14 100
Score
Grader
1
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e

Partial preview of the text

Download Final Examination Problems - Database Systems | CS 411 and more Exams Deductive Database Systems in PDF only on Docsity!

CS411 Database Systems

Spring 2010, Prof. Chang

Department of Computer Science

University of Illinois at Urbana-Champaign

Final Examination

May 14, 2010

Time Limit: 180 minutes

  • Print your name and NetID below. In addition, print your NetID in the upper right

corner of every page.

Name: NetID:

  • Including this cover page, this exam booklet contains 16 pages. Check if you have

missing pages.

  • The exam is closed book and closed notes. You are allowed to use scratch papers.

No calculators or other electronic devices are permitted. Any form of cheating on the

examination will result in a zero grade.

  • Please write your solutions in the spaces provided on the exam. You may use the blank

areas and backs of the exam pages for scratch work.

  • Please make your answers clear and succinct; you will lose credit for verbose, convo-

luted, or confusing answers. Simplicity does count!

  • Each problem has different weight, as listed below– So, plan your time accordingly.

You should look through the entire exam before getting started, to plan your strategy.

Problem 1 2 3 4 5 6 7 Total

Points 22 16 11 10 12 15 14 100

Score

Grader

Problem 1 (22 points) Misc. Concepts

For each of the following statements, indicate whether it is TRUE or FALSE by circling your choice, and provide an explanation to justify. You will get 2 points for each correct answer with correct explanations, and no penalty (of negative points) for wrong answers.

(1) An E-R diagram will translate uniquely to a relational schema.

(2) In an E-R diagram, an entity will translate to a table in the relation model, while a relationship will translate to a join between tables.

(3) We can consider relational algebra as a query language.

(4) The basic operators in relational algebra are π, σ, ρ, on, ∪, −.

(5) In SQL, we can only use aggregate functions where there is a Group-By clause.

(6) In rule-based optimization, a commonly used heuristic rule is to push projection down, to reduce the number of columns early on.

(7) In optimizing join queries, we choose to assume only left-deep join trees, because such trees are most efficient.

(8) When we program a database transaction, we can use the “Abort” command to rollback a transaction that cannot be successfully completed.

(9) Grouping (i.e., group-by) must be processed in a two-pass algorithm.

(10) For logging, we prefer the UNDO+REDO scheme, so that we can be more flexible in when to write out dirty data to disk.

(11) One particular motivation for uncertain database research is that data may be inherently uncertain, e.g., weather forecast gives probabilistic predictions.

(Problem 2, cont.)

(4) For the above tables, what is the result of this query: σA< 2 (RonS)? Show the result table.

(5) Name one difference of query semantics in an uncertain database as compared to deterministic SQL queries.

(6) Give one advantage of UNDO-only logging over REDO-only logging.

(7) Given a memory space of 1000 blocks, for hashing a relation of 10^6 blocks, what is the minimal size (as the number of blocks) of a bucket?

(8) In tuning system performance, we should use Hints sparingly. Why?

Problem 3 (11 points) Query Languages

The following questions refer to the database schema below:

Professor (pid, pname, dept), Student(sid, sname, age), Course(cid, pid, cname, time, location), Grade(cid, sid, score).

(a) Write a query, in relational algebra, to return the names of professors who work in “CS” department and teach at least two courses. (4 points)

(b) Write an SQL query, to return the name, and the average score of each student, ordered by the average score (descending) (Note: If one student is registered for two courses and the scores are 75 and 85 , respectively, his average score is 80 ). (4 points)

(c) The professor of CS411 would prefer to enhance the score of young students (younger than 20). Write a statement that scales these students’ score by 1.07 (3 points)

(b) Show the resulting tree after deleting key 39 from the original tree. (4 points)

(c) Show the resulting tree after inserting key 24 into the original tree. (4 points)

Problem 5 (12 points) Query Processing

Consider two relations R(x, y) and S (x, z) with the following statistics: T (R) = 5,000, B (R) = 500 (each block contains 10 tuples), V (R, x) = 100 (number of distinct values of attribute x in R), T (S) = 8,000, B (S) = 1,000 (each block contains 8 tuples), V (S, x) = 200 (number of distinct values of attribute x in S), V (S, z) = 50 (number of distinct values of attribute z in S) and z < 100. Assume the memory buffer has 101 blocks (M = 101)

(a) Estimate the number of tuples in R ./z< 50 S (3 points)

(b) Briefly describe the algorithm and compute the cost of join using a block nested-loop join. ( 5 points)

Problem 6 (15 points) Query Optimization

(a) Judge the following two rules to be true/false, if true, give a short proof. If false, give a counterexample. (6 points)

(a.1) Projection can be pushed below set union. (3 points)

(a.2) Duplicate elimination can be pushed below projection. (3 points)

(b) Consider the physical query plans for the following expression: (R(w, x) ./ S(x, y)) ./ U (y, z) We have the following assumptions:

(1) B(R) = 5, 000, B(S) = B(U ) = 12, 000 (2) The intermediate result R ./ S occupies k blocks for some k. (3) Both joins will be implemented as hash-joins, either one-pass or two-pass, depending on k. (4) The memory buffer has 101 blocks (M = 101)

Consider three cases regarding the range of k and compute the cost of the best physical query plan for each case and fill in the blanks in the table on the next page. When you choose the two-pass algorithm for the final join, make sure to specify the number of buckets in the fields of the third column as well. (9 points)

Problem 7 (14 points) Failure Recovery

Consider the following log sequence.

Log ID Log 1 〈START T 1 〉

2 〈T 1, A, 10〉

3 〈START T 2 〉

4 〈T 2, B, 15〉

5 〈T 2, C, 20〉

6 〈START T 3 〉

7 〈T 1, B, 25〉

8 〈COMMIT T 1 〉

9 〈T 3, C, 30〉

10 〈COMMIT T 2 〉

11 〈START T 4 〉

12 〈T 3, D, 35〉

13 〈T 4, A, 40〉

14 〈COMMIT T 3 〉

15 〈T 4, C, 45〉

16 〈COMMIT T 4 〉

17 〈START T 5 〉

Note: For the questions (e)-(g), assume the given log sequence is a REDO log.

(e) Please briefly explain the meaning of the record: 〈T 3, D, 35〉 (logID 12) (1 points)

(f) When is the earliest time for transaction T 3, T 4 that “dirty data” can be flushed onto disk (i.e. the time Output(X) for data X can be performed)? (2 points)

For T 3: Output(s) before logID For T 4: Output(s) before logID

(g) Suppose we set the checkpointing in the following way (the original IDs do not change):

Log ID Log 1 〈START T 1 〉

2 〈T 1, A, 10〉

3 〈START T 2 〉

4 〈T 2, B, 15〉

5 〈T 2, C, 20〉

6 〈START T 3 〉

7 〈T 1, B, 25〉

8 〈COMMIT T 1 〉

9 〈T 3, C, 30〉

−→ 〈START CKPT, T 2, T 3 〉

10 〈COMMIT T 2 〉

11 〈START T 4 〉

12 〈T 3, D, 35〉

13 〈T 4, A, 40〉

−→ 〈END CKPT〉

14 〈COMMIT T 3 〉

15 〈T 4, C, 45〉

16 〈COMMIT T 4 〉

17 〈START T 5 〉 Now the system crashes right after logID 14 (the end checkpoint has been written out to disk). Show which transactions/actions (e.g.: 〈T 1, A, 10〉) need to be redone in the correct order. (3 points)

Problem 1 (22 points) Misc. Concepts

For each of the following statements, indicate whether it is TRUE or FALSE by circling your choice, and provide an explanation to justify. You will get 2 points for each correct answer with correct explanations, and no penalty (of negative points) for wrong answers.

(1) F alse An E-R diagram will translate uniquely to a relational schema.

⇒ Explain: It can be translated into different relational schemas, depending on different merging strategies, as well as different subclassing methods.

(2) F alse In an E-R diagram, an entity will translate to a table in the relation model, while a relationship will translate to a join between tables.

⇒ Explain: A relationship in ER diagram normally would also translate into a table in the relation model. For many-to-one relationships, they can be absorbed into the table for entity.

(3) T rue We can consider relational algebra as a query language.

⇒ Explain: It is a query language, although it is not declarative (SQL is declarative).

(4) F alse The basic operators in relational algebra are π, σ, ρ, on, ∪, −.

⇒ Explain: onis not a basic operator. Instead, Cartesian product × is a basic operator.

(5) F alse In SQL, we can only use aggregate functions where there is a Group-By clause.

⇒ Explain: We can use aggregate directly in Select clause without a Group-By clause.

(6) T rue In rule-based optimization, a commonly used heuristic rule is to push projection down, to reduce the number of columns early on.

⇒ Explain: It is true. Another heuristic is to push selection down to reduce the number of tuples.

(7) F alse

In optimizing join queries, we choose to assume only left-deep join trees, because such trees are most efficient.

⇒ Explain: Not necessarily. Bush trees can sometimes be more efficient. We use left-deep only for the easiness of plan selection.

(8) T rue When we program a database transaction, we can use the “Abort” command to rollback a transaction that cannot be successfully completed.

⇒ Explain: True. “Abort” would rollback the operations to make it as if the transaction did not take place.

(9) F alse Grouping (i.e., group-by) must be processed in a two-pass algorithm.

⇒ Explain: False. It could be processed in one-pass algorithm as well, if there is enough resource.

(10) T rue

For logging, we prefer the UNDO+REDO scheme, so that we can be more flexible in when to write out dirty data to disk.

⇒ Explain: True. UNDO plus REDO gives us more flexibility.

(11) T rue

One particular motivation for uncertain database research is that data may be inherently uncertain, e.g., weather forecast gives probabilistic predictions.

⇒ Explain: True. This reflects the real world, where many things are uncertain.

(Problem 2, cont.)

(4) Answer: There are two tuples < 1 , 3 , 2 , 2 , 2 > and < 1 , 3 , 2 , 4 , 1 > for schema (A,B,C,D,E).

For the above tables, what is the result of this query: σA< 2 (RonS)? Show the result table.

(5) Answer: There are many differences. For instance, uncertain database adopts the possible worlds semantics, where a probabilistic database is viewed as a set of possible instances (worlds) associated with their probabilities. Name one difference of query semantics in an uncertain database as compared to deterministic SQL queries.

(6) Answer: In UNDO-only logging, data is flushed immediately after a transaction finished, thus it requires less buffer to keep the data. For more details on comparing the two logging schemes, refer to page 869 of the textbook. Give one advantage of UNDO-only logging over REDO-only logging.

(7) Answer: 10^6 /1000 = 1000

Given a memory space of 1000 blocks, for hashing a relation of 10^6 blocks, what is the minimal size (as the number of blocks) of a bucket?

(8) Answer: Since the database may change and evolve, and in such situations Hints may do harm to the system performance. In tuning system performance, we should use Hints sparingly. Why?

Problem 3 (11 points) Query Languages

The following questions refer to the database schema below:

Professor (pid, pname, dept), Student(sid, sname, age), Course(cid, pid, cname, time, location), Grade(cid, sid, score).

(a) Write a query, in relational algebra, to return the names of professors who work in “CS” department and teach at least two courses. (4 points)

Solution:

(b) Write an SQL query, to return the name, and the average score of each student, ordered by the average score (descending) (Note: If one student is registered for two courses and the scores are 75 and 85 , respectively, his average score is 80 ). (4 points)

Solution: select sname, AVG(score) from Student, Result where Student.sid = Result.sid group by sname order by AVG(score) DESC

(c) The professor of CS411 would prefer to enhance the score of young students (younger than 20). Write a statement that scales these students’ score by 1.07 (3 points)

Solution: update Student, Result set score = score ∗ 1. 07 where Student.sid = Result.sid AND Student.age < 20 AND Result.cid = ‘CS411’