


Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Problems related to database systems, including merge-sort for large relations, pointer swizzling for data storage, data representation for student records, indexing on sequential files, and b+tree operations. Problems involve calculating minimum memory size, expected pointer following cost, record layout, minimal number of tuples for an index, and b+tree modifications.
Typology: Assignments
1 / 4
This page cannot be seen from the preview
Don't miss anything!



Note: Print your name and NetID in the upper right corner of every page of your submission.
Suppose we have a relation with 15000 tuples and each tuple requires 10 bytes. We have a machine whose main memory and disk-block size (80 bytes) are sufficient to sort the 15000 tuples using TPMMS (Two-Phase, Multiway Merge-Sort). What is the minimum size of the main memory?
Suppose that the important actions related to data storage take the following times, in some arbitrary time units:
Suppose we design a pointer-swizzling control scheme like the following. At the beginning, we automatically swizzle 20% of the pointers and leave the rest unswizzled. Once a pointer is followed, we swizzle it by probability 0.5. If an unswizzled pointer has been followed twice, we swizzle it. Suppose there are 100 pointers in our data. The number of times that they are followed by a program is distributed according to the following histogram.
times of being followed 0 1 2 3 number of pointers 10 40 30 20
What’s the expected cost of this program in terms of pointer following?
A student record consists of the following fixed-length fields: date of birth and SSN, each 8 bytes long. It also has the following variable-length fields: name and office. The records are augmented by an additional repeating field that represents exams. Each exam requires a date (12 bytes) and an integer result (4 bytes) of the exam. Pointers within a record require 4 bytes, and the record length is a 4-byte integer. You may assume that no alignment of fields is required.
(a) Show the layout of students records if:
(1) The variable-length fields and repeating exams are kept within the record itself.
(2) The variable-length fields and repeating exams are stored outside of the record, with pointers to them in the record.
(b) The exam results of a student has a probability p of being inqueried. There are n test results per student in average. The average lengths of name and office are 8 bytes and 12 bytes, respectively. Suppose the cost function of the above scheme (1) is len, where len is the record length. The cost of scheme (2) is len + 100 × p, for penalizing accessing exam results on seperate blocks. What is the minimal value of n as a function of p, so that the above scheme (2) is better than scheme (1).
Given a relation of 200,000 tuples. Suppose each block could hold 10 tuples or 20 key-pointer pairs. The index is built on the key field of the relation (thus no duplicate search keys) and the file is sorted according to the key. Answer the following questions:
a) How many blocks do we need for a dense index of this relation?
b) How many blocks do we need for a sparse index of this relation?
c) How many blocks do we need for a 2-level index of this relation (given that the first level is dense)?
d) Suppose we use a 3-level index whose first level is sparse. If at most 12000 blocks can be used to store the index, what is the maximal number of tuples that the relation can have?
Execute the following operations on Fig. 13.23 (page 635 of the textbook). Show the detailed steps, using Example 13.23 and Example 13.24 as examples.
(a) Lookup the record with key 29.
(b) Lookup the record with key 38.
(c) Lookup all records in the range 25 to 38.
(d) Lookup all records with keys less than 25.
Consider the B+ tree index of degree 2 shown in the figure below. For each problem below, you only need to show the final resulting B+ Tree.
a) Show the tree that would result from inserting a data entry with key 40 into this tree.
b) Show the B+ tree that would result from inserting a data entry with key 30 into the original tree.
c) Show the B+ tree that would result from deleting the data entry with key 45 from the original tree.
documents that contain the keyword ”database”, all the documents that contain both ”in- formation” and ”retrieval”, and so on. Indexing documents is essential in processing IR queries efficiently.
The following questions (a) and (b) require you to do some research on the textbook, liter- ature and Web.
a) What kind of index structure is often used for documents?
b) Briefly summarize how such index is utilized to index documents and why it is helpful in IR queries.
The following question require you to think and compare.
c) Why general indexing structures such as B+tree and hash table are inappropriate for indexing documents?