Download Final Exam - 4 Problems on Database Systems | CS 411 and more Exams Deductive Database Systems in PDF only on Docsity!
University of Illinois at Urbana-Champaign
Final Examination
CS411 Database Management Systems
Time Limit: 180 minutes
Exam Date: May. 8, 2008
- Closed notes; closed book; no sheet of formulas permitted.
- Please write your NetID on the upper left hand side of each page.
- Please write your answers directly on the exam sheet. The space we left for your answers
is often more than what you actually need. Please use the back side of the exam as
scratch paper.
- In case you find a question ambiguous, please write down your assumption and answer
the question accordingly.
- Answers to queries should not contain duplicates, for any query on this
exam. Do not forget to remove duplicates!
- You may use temporary relations, if you like, for any of the queries below. If you use
the same temporary relation for a second exam question, you must redefine the relation
in your answer to the second question. In other words, your answer to each question
should be self-contained.
Your Name:
Good Luck!
I. [T rue/False questions, 40 points] For each of the following statements, indicate whether it is TRUE or FALSE by circling your choice. You will get 1 point for each correct answer, -1 point for each incorrect answer, and 0 point for each answer left blank.
- True False According to the dependency theory, it is possible for a relation not to have a key.
- True False Under set semantics, the final result size of a series of joins is the same no matter what order you do the joins in.
- True False According to dependency theory, it is always bad if a relation has more than one key.
- True False You should be always able to come up with a lossless, dependency preserving, BCNF version of your applicationās schema.
- True False According to dependency theory, it is bad if the only key for a relation is all of its attributes together.
- True False If your schema is in 3NF but not in 4NF, then you probably need to revise it.
- True False If a decomposition is not lossless, then you should absolutely not use it for your application schema.
- True False If a decomposition does not preserve dependencies, then you should absolutely not use it for your application schema.
- True False One weakness of triggers is that they can only be activated after an update/insert/delete, not before.
- True False Hybrid hash joins are a refinement of hash joins that makes them go faster, but depends on there being enough memory available.
- True False One characteristic of nested-loop joins is that you can always use them, even if you have very little memory and the data is not sorted.
- True False For a read of a database page, typically the longest component is the transfer time for the page.
- True False You can find the best join order for a sequence of joins using a dynamic programming algorithm.
- True False Generally speaking, a query plan that generates few intermediate result tuples is preferable to one that generates a lot of intermediate result tuples.
- True False Given a choice between materializing the result of a join and pipelining it to the next operation, in general you should materialize it if you can.
- True False The primary reason that commercial DBMSs tend to use undo/redo logging is that recovery after a crash is faster.
- True False The bad thing about a non-quiescent checkpoint is that it takes much longer to finish than a quiescent checkpoint does.
- True False The bad thing about a quiescent checkpoint is that recovery using it is much slower than it would be if we had used a non-quiescent checkpoint.
- True False If you complete a quiescent checkpoint and the system crashes afterward, you will never need to read the part of the log before the checkpoint wrote āSTART CHECKPOINTā, no matter what kind of logging you use.
- True False To guard against the damage that can be caused by a disk crash, you can keep a log of transaction starts, writes, and commit operations.
- True False One disadvantage of allowing dirty reads is that you might get cascading rollback.
- True False If a transaction is about to commit and a schema-level integrity constraint is violated, then the trans- action will be aborted.
- True False Under most logging and recovery paradigms, a transaction T is not committed until the āT COMMITSā log record is written out to nonvolatile storage.
- True False Once a transaction commits, it will not be undone, even if a crash occurs very soon.
II. [Short Answer Questions, 60 points, each 5 points].
- What is the primary reason for taking checkpoints? Having checkpoints reduces the recovery time.
- How does the DBMS typically get rid of duplicates when answering a SQL āSELECT DISTINCTā query? They do it through both sorting and hashing relations.
- What does the D in ACID stand for? It stands for durability.
- Suppose you are writing a new web database application. What is the best approach to take to protect yourself against SQL injection attacks? The best approach is to monitor and parse the web requests for suspicious patterns before passing it to the database.
- What is the main reason that you (the DBA) would choose a B-tree index for an attribute rather than a hashed index (with linear or extensible hashing)? If you need to make an assumption to answer the question, then state your assumption. B-tree indices provide better response time for range queries. They also give better average and worst case performance when large number of tuples are inserted into the table periodically.
- Give a specific example of when you might be willing not to enforce serializability for a particular transaction. There is no need to enforce seriazability for transactions that just collect some statistics from database.
- Why do most DBMSs allow transactions to lock a page (as opposed to just individual tuples)? They lock a page in an attempt to prevent other transactions to re-arrange the records in the page.
III. Long Answer Questions, 170 points 1. Normalization, 20 points Consider relation Course(Number, Name, Area, Faculty) where each course is taught by just one faculty member all the time, each course has a unique number, and no two different courses have the same combi- nation of name and area. Provide a 3NF version of this relation and determine if your answer is in BCNF and why it is or it is not in BCNF. Available functional dependencies are:
- N umber ā N ame; N umber ā Area; N umber ā F aculty
- N ame, Area ā N umber; N ame, Area ā F aculty Therefore, the table already is in 3NF. It is also in BCNF because there is not any other functional depen- dencies in the tables except the ones involving super keys in their left hand side. Any other solution based on reasonable assumptions is acceptable.
- Security, 30 points, each part 15 points (a) Show how to carry out a cross-site request forgery attack. You need to give enough detail in your example to convince us that you know what you are talking about. An employee with the power to fire people from the company requests document http://attacksite.org/cuteKittensScreensaver.html and receives:
HTTP/1.1 200 OK Content?Length: 121 <imgsrc=http://cutecats.com/cat_of_the_week.gif> <imgsrc="http://mycompany.org/admin/terminate_employee.php?employee_id=WilliamGates">
When the employeeās browser requests the second image, the employeeās browser will cause William Gates to be fired:
GET /admin/terminate_employee.php?employee_id=WilliamGates HTTP/1. Host: mycompany.org Cookie: PHPSESSID=
Many people gave a CSS attack or some other legitimate non-CSRF attack as their solution, for which they received half credit.
(b) What is the best way to guard against such attacks, if you are writing a new database application? A database can be successfully attacked by an external CSRF attack even if the database lives behind a firewall and the applications that access it are only available inside the company/organization ā in other words, the database itself is not accessible from outside the company. For this reason, current guidelines for protecting against CSRF attacks recommend that (1) all DB applications that can be invoked with a GET/POST request (or indirectly as a side effect of such a request) be set up to require manual input from the user before any drastic action is carried out. For example, to counteract the example attack given above, a pop-up window might say, āPlease click here to confirm that you want to fire employee William Gates.ā Current guidelines also recommend that (2) on receipt of a request, the DB application code should check to make sure that the DB request is made with POST rather than with GET. Also, the DB application itself should be set up to use POST rather than GET. A third recommendation is that (3) each DB application front-end form (e.g., the interface that is used to fire employees) sent to a user include a fresh token, and have this token be included when the form is submitted. If an appropriate token is not included with the request, or if the token is no longer fresh (i.e., has timed out), then the user should be sent an error message and the request should be ignored. Many people gave a remedy for some other legitimate non-CSRF attack as their solution (regardless of how they answered part (a)), for which they received half credit.
- Query Optimization, 40 points Consider the following relations:
Movie(Title, Year, Rating, StudioName) Studio(Name, Country, Address)
Assume each movie is produced by just one studio, whose name is mentioned in the StudioName attribute of the Movie relation. Also, Title and Name are primary keys for Movie and Studio, respectively. The rating of a movie shows how good the movie is, and its range is [1,...,10]. The following statistics are available about the relations:
Movie Studio T(Movie) = 24000 T (Studio)= 1000 V(Movie,StudioName)= V(Movie,Rating)= The following query returns the movies with a rating of 10 produced in each country after 1990:
SELECT Country, Title FROM Movie, Studio WHERE Movie.StudioName = Studio.Name and Year > 1990 and Rating = 10
Suggest an optimized logical query plan for the above query. Then, estimate the size of each intermediate relation in your query plan. By intermediate relation we mean the relation created after each selection or join.
The optimized logical query plan for the above query is: ĻCountry,T itle
./N ame=StudioN ame
ĻStudioN ame,T itle
ĻY ear> 1990 ,Rating=
Movie
ĻN ame,Country
Studio
There are tow relations created after the selection and the join. For the relation created after the selec- tion we have: N 1 = (^2400010) ā 3 = 800 and for the one after the join: N 2 = 8001000 ā^1000 = 800
Now consider an undo log corresponding to the original schedule. Suppose that a crash occurs right after the log record with asterisks next to it is written to disk, and the recovery procedure is then run.
(d) What transactions get rolled back? Transactions T3 and T1 get rolled back.
(e) Does cascading rollback take place? Why or why not? As transaction T2 reads a value from T1, it gets rolled back as well.
(f) Which operations in the log get undone? All the write operations by transactions T1, T2, and T3 will be undone.
(g) At the time of the crash, which of the five data write operations are guaranteed to have taken place on disk? All the write operations by T2 are guaranteed to have taken place on disk.
(h) At the time of the crash, which of the five data write operations are guaranteed to not have taken place on disk? None. All of the write operation could have already taken place on the disk.