

Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Solutions to graded homework1 for cosc 6340 database systems course offered in spring 2003. The homework includes problems on implementing joins, relational database design, reasoning with dependencies, physical database design, and object similarity assessment. Each problem is followed by detailed explanations and computations.
Typology: Assignments
1 / 2
This page cannot be seen from the preview
Don't miss anything!


Dr. Christoph F. Eick
Available Points: 72 Due Dates : Problems 1-3 : Submit solutions electronically to [email protected] (deadline We., March 19, 9p; no late solutions accepted! ) and bring a hardcopy to the Th., March 20 class. Some solutions will be discussed in the March 20, 2003 class. Problems 4-5 : Tu., April 1, 2003 in class or during my office hours ( no late solutions accepted! )
R1(A, B, C) and R2(A, D). A is the primary key for R2 and R1.A is a foreign key (referring to R2.A) in R1 (R1[A] . R2[A]). R1 and R2 are stored as an unordered file (blocks are assumed to be 100% full); R1 contains 200000 tuples that are stored in 500 blocks, and R2 has 50000 tuples that are stored in 1000 blocks (that is, every R2.A value occurs at an average four time as a value of R1.A). Moreover, assume that only a small buffer size of 8 blocks is available. Based on these assumptions answer the following questions (indicate, if you assume in your computations that the output relation is written back to disk or not):
b. What is the cost for implementing the natural join using the block-nested loop join? [2] c. Now assume that either a hashed index on A of R1 or a hashed index on A of R2 is available (assume that there are no overflow pages). Compute the cost for using the index nested loops join using the index for R1.A and for using the index nested loops join for R2.A (2 computations][4] d. Is it possible to apply the hash-join in this case (explain your answer!)? [2] e. Which of the 4 proposed implementations is the best? [1] f. Now assume that the response time of the methods you evaluated is not satisfactory.
Assume a relation R(A, B, C, D, E) with the following functional dependencies is given: (1) A BC (2) B A (3) D A (4) E D a) What is (are) the candidate key(s) for R? [1] 1
b) Transform R into a relational schema that is in BCNF and which does not have any lost dependencies; if this is not possible decompose R into a schema that is in BCNF and has the fewest number of lost functional dependencies. Give the derivation tree for your solution and specify clearly, if and what functional dependencies are lost for a particular relational schema. [11]
Consider the following relation R(A,B,C,D,E) with the following dependencies is given ( denotes a functional dependency and denotes a multi-valued dependency): (1) A C (2) B DE a) Assume we decompose R into R1(A,B, C) and R2(C, D, E). Does this decomposition have the lossless join property --- is it possible to reconstruct R from R1 and R2 using a natural join? Give reasons for your answer! [6] b) What is (are) the candidate key(s) of R? [1] c) Does ED hold for R? Give reasons for your answer (either show that the dependency can be inferred from (1) and (2), or give a counter example of the contents of R that satisfies (1) and (2) but which violates ED)! [6]
Solve problem 20.6 (problem 16.8 second edition) of the textbook (pages 689/493)
Assume the following relation Students(ssn, age, gpa, avg_class_rank, Country, Continent) that contains students that were admitted in the year 2000 into our undergraduate program is given. You can assume that age is an integer; the maximum age is 50 the minimum age is 20, and the average age is 28 and the mean absolute deviation is 10. gpa denotes the UH COSC gpa; the average gpa is 2.8 and the mean absolute deviation is 0.6; the maximum gpa is 4.0 the minimum gpa is 0. Avg_class_rank has 5 values (4=top-5%, 3=top-15%, 2=top-25% 1=top_half, 0=bottom half) Assume that students from the same continent are more similar than students from different continents. a) Define a student similarity measure that considers gpa and class_rank of being of major importance, and age and country/continent of being of minor importance. [11] b) Using your (dis)similarity measure compute the (dis)similarity for the following 4 student tuples [3] : (111111111, 25, 2.8, 2, China, Asia) (222222222, 24, 3.7, 3, India, Asia) (333333333, 37, 2.9, 4, USA, America) (444444444, 24, 3.8, 4, USA, America) 2