3 Problems with Solution - Advanced Data Management - Exam 2 | CS 511, Exams of Deductive Database Systems

Material Type: Exam; Professor: Chang; Class: Advanced Data Management; Subject: Computer Science; University: University of Illinois - Urbana-Champaign; Term: Fall 2009;

Typology: Exams

2010/2011

Uploaded on 06/14/2011

koofers-user-zpe-1
koofers-user-zpe-1 🇺🇸

3

(2)

10 documents

1 / 8

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
NetID:
CS511 Advanced Database Systems
Fall 2009, Prof. Chang
Department of Computer Science
University of Illinois at Urbana-Champaign
Midterm Examination 2
November 18, 2009
Time Limit: 75 minutes
Print your name and NetID below. In addition, print your NetID in the upper right
corner of every page.
Name: NetID:
Including this cover page, this exam booklet contains 7pages. Check if you have
missing pages.
The exam is open book and open notes (any and all books/notes). Scientific calculators
of any kinds are allowed. No other electronic devices are permitted. Any form of
cheating on the examination will result in a zero grade.
Please write your solutions in the spaces provided on the exam. You may use the blank
areas and backs of the exam pages for additional space or scratch work.
Please make your answers clear and succinct; you will lose credit for verbose, convo-
luted, or confusing answers. Simplicity does count!
Each problem has different weight. You should look through the entire exam before
getting started, to plan your strategy.
Problems that are related to homework or study-guide problems (e.g., in terms of
concepts covered) are marked with [HW ].
Problem 1 2 3 Total
Points 52 20 28 100
Score
Grader
1
pf3
pf4
pf5
pf8

Partial preview of the text

Download 3 Problems with Solution - Advanced Data Management - Exam 2 | CS 511 and more Exams Deductive Database Systems in PDF only on Docsity!

CS511 Advanced Database Systems

Fall 2009, Prof. Chang

Department of Computer Science

University of Illinois at Urbana-Champaign

Midterm Examination 2

November 18, 2009

Time Limit: 75 minutes

• Print your name and NetID below. In addition, print your NetID in the upper right

corner of every page.

Name: NetID:

• Including this cover page, this exam booklet contains 7 pages. Check if you have

missing pages.

• The exam is open book and open notes (any and all books/notes). Scientific calculators

of any kinds are allowed. No other electronic devices are permitted. Any form of

cheating on the examination will result in a zero grade.

• Please write your solutions in the spaces provided on the exam. You may use the blank

areas and backs of the exam pages for additional space or scratch work.

• Please make your answers clear and succinct; you will lose credit for verbose, convo-

luted, or confusing answers. Simplicity does count!

• Each problem has different weight. You should look through the entire exam before

getting started, to plan your strategy.

• Problems that are related to homework or study-guide problems (e.g., in terms of

concepts covered) are marked with [HW ].

Problem 1 2 3 Total

Points 52 20 28 100

Score

Grader

Problem 1 (52 points) Misc. Concepts

You will get 4 point for each correct answer with correct explanations, and no penalty (of negative points) for wrong answers.

(1) True [HW ] Predicate calculus is more high level than relational algebra. ⇒ Explain: Predicate calculus is more high level than relational algebra. One only needs to describe property in calculus, without needing to worry about how to get the result of a query (e.g., operators to use, order, etc.).

(2) False [HW ] When deleting a node from an R-tree, reinsertion is chosen as a way to deal with “orphaned entries” because merging (as in B-tree) is infeasible for R-tree. ⇒ Explain: Merging is not infeasible. Reinsertion is chosen for its merits: a) it is easier for implementation b) it can incrementally refine the tree structure to prevent deterioration.

(3) False We can use R-tree to index multiple-attribute data items like (salary, age), but the indexing will not be effective. ⇒ Explain: R-tree is ideal for supporting indexing multi-dimensional data. It can be ef- fectively used for indexing index multiple-attribute data items like (salary, age), by checking overlap of rectangles. Example queries could be point queries (salary of 80k at age 25), and range queries (salary between 80-90k, and age between 30-35).

(4) False [HW ] If a transaction releases its read lock before the end of the transaction, there is a danger of cascading rollback. ⇒ Explain: Strict locking (hold exclusive lock till end of transaction) is needed to prevent cascading rollback. It is not necessary to hold read lock till the end of transaction. Alternative answer: If you answer True, and explain that cascading rollback is possible if other transactions do not hold exclusive lock till the end of transaction, you also get full credit.

(5) False [HW ] Precision and recall as two major IR metrics were coined in the SMART project in 1960’s. ⇒ Explain: They are first proposed by Kent et.al. in the 1950s, and established as a formal procedure in the Cranfield test in the 1960s.

(6) True [HW ] In the discrimination value model, the value of an index term is based on its “discrimination value”—which is predicted by its IDF. ⇒ Explain: Fig 7 of the “A Vector Space Model for Automatic Indexing” paper indicates that the “discrimination value” is dependent on document frequency. Terms with medium document frequency are those with good “discrimination values”. Alternative answer: If you answer False, and explain that the “discrimination values” are not always positively related with IDF, you also get full credit.

This is clearly not desirable (if we see the top 10 results having exactly the same content). Google, for instance, tries to group such identical pages into one result.

Problem 2 (20 points) PageRank [HW ]

This problem will exercise your insight for the notion of PageRank. Consider each of the following graphs representing the Web, where nodes represent pages and directed edges hyperlinks. For simplicity, let’s use the same simplified PageRank definition as given in class.

Our question will be based on the following graph, which we call Circular, as the starting point, upon which we will make some changes.

a

b

c d

e

f

Part 1(8 points)

For the following Web graph, which is a slight change to Circular, speculate what would be the relative PageRank for each page by identifying pages with non-zero PageRank and their rank ratios, and explain why. Note that, we ask you to only speculate intuitively– without performing iterative fixpoint computation. If there is no clear intuition to speculate, state so and explain why.

a

b

c d

e

f

Answer:

Node d will have the lowest PageRank (which will converge to 0), since it has no incoming links.

Node e will have PageRank close to 0 as well, since it has only 1 incoming links from node d whose score converges to 0.

Given this, the rest 4 nodes a, b, c, f will form a circular, and will eventually have the same PageRank score, with 0.25 ratio.

Problem 3 (28 points) Indexing

Part 1(10 points)

R-tree is said to have generalized B-tree in several design concepts. Identify two such major concepts that exist in both types of indexing, and explain how R-tree generalizes B-tree.

Answer:

This question is open to your observation and thoughts– We do not expect a standard set of answers. Rather, we read your arguments and grade on how clear, concrete, and convincing they are.

Part 2(18 points)

Let’s continue such generalization for different types of data. Consider data type S as a set of integers, e.g., s1 = {1, 2, 6, 8}, s2 = {-1, 4, 15}, s3 = { 20 }, s4 = {48, 60, 102}.

Sketch, concisely, your design of an index tree for such data, by further generalizing the concepts of R-tree. Describe your design clearly—what types of queries are reasonable to support, what each node means, how to split a node, and how to perform search for a query.

Answer:

This question is open to your observation and thoughts– We do not expect a standard set of answers. Rather, we read your arguments and grade on how clear, concrete, and convincing they are.