Database Systems Homework 3: Problem Set, Assignments of Deductive Database Systems

Problems related to database systems, including merge-sort for large relations, pointer swizzling for data storage, data representation for student records, indexing on sequential files, and b+tree operations. Problems involve calculating minimum memory size, expected pointer following cost, record layout, minimal number of tuples for an index, and b+tree modifications.

Typology: Assignments

Pre 2010

Uploaded on 03/10/2009

koofers-user-ctn-1
koofers-user-ctn-1 🇺🇸

9 documents

1 / 4

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
CS411 Database Systems
Fall 2005
HW#3
Due: 11:30am CST, 11/18/05
Note: Print your name and NetID in the upper right corner of every page of your submission.
Problem 1 Merge-Sort
Suppose we have a relation with 15000 tuples and each tuple requires 10 bytes. We have a
machine whose main memory and disk-block size (80 bytes) are sufficient to sort the 15000
tuples using TPMMS (Two-Phase, Multiway Merge-Sort). What is the minimum size of the
main memory?
Problem 2 Pointer Swizzling
Suppose that the important actions related to data storage take the following times, in some
arbitrary time units:
On-demand swizzling of a pointer: 30;
Automatic swizzling of pointers: 20 per pointer;
Following a swizzled pointer: 1;
Following a unswizzled pointer: 10.
Suppose we design a pointer-swizzling control scheme like the following. At the beginning,
we automatically swizzle 20% of the pointers and leave the rest unswizzled. Once a pointer
is followed, we swizzle it by probability 0.5. If an unswizzled pointer has been followed twice,
we swizzle it. Suppose there are 100 pointers in our data. The number of times that they
are followed by a program is distributed according to the following histogram.
times of being followed 0 1 2 3
number of pointers 10 40 30 20
What’s the expected cost of this program in terms of pointer following?
Problem 3 Data Representation
A student record consists of the following fixed-length fields: date of birth and SSN, each
8 bytes long. It also has the following variable-length fields: name and office. The records
are augmented by an additional repeating field that represents exams. Each exam requires a
date (12 bytes) and an integer result (4 bytes) of the exam. Pointers within a record require
4 bytes, and the record length is a 4-byte integer. You may assume that no alignment of
fields is required.
(a) Show the layout of students records if:
1
pf3
pf4

Partial preview of the text

Download Database Systems Homework 3: Problem Set and more Assignments Deductive Database Systems in PDF only on Docsity!

CS411 Database Systems

Fall 2005

HW#

Due: 11:30am CST, 11/18/

Note: Print your name and NetID in the upper right corner of every page of your submission.

Problem 1 Merge-Sort

Suppose we have a relation with 15000 tuples and each tuple requires 10 bytes. We have a machine whose main memory and disk-block size (80 bytes) are sufficient to sort the 15000 tuples using TPMMS (Two-Phase, Multiway Merge-Sort). What is the minimum size of the main memory?

Problem 2 Pointer Swizzling

Suppose that the important actions related to data storage take the following times, in some arbitrary time units:

  • On-demand swizzling of a pointer: 30;
  • Automatic swizzling of pointers: 20 per pointer;
  • Following a swizzled pointer: 1;
  • Following a unswizzled pointer: 10.

Suppose we design a pointer-swizzling control scheme like the following. At the beginning, we automatically swizzle 20% of the pointers and leave the rest unswizzled. Once a pointer is followed, we swizzle it by probability 0.5. If an unswizzled pointer has been followed twice, we swizzle it. Suppose there are 100 pointers in our data. The number of times that they are followed by a program is distributed according to the following histogram.

times of being followed 0 1 2 3 number of pointers 10 40 30 20

What’s the expected cost of this program in terms of pointer following?

Problem 3 Data Representation

A student record consists of the following fixed-length fields: date of birth and SSN, each 8 bytes long. It also has the following variable-length fields: name and office. The records are augmented by an additional repeating field that represents exams. Each exam requires a date (12 bytes) and an integer result (4 bytes) of the exam. Pointers within a record require 4 bytes, and the record length is a 4-byte integer. You may assume that no alignment of fields is required.

(a) Show the layout of students records if:

(1) The variable-length fields and repeating exams are kept within the record itself.

(2) The variable-length fields and repeating exams are stored outside of the record, with pointers to them in the record.

(b) The exam results of a student has a probability p of being inqueried. There are n test results per student in average. The average lengths of name and office are 8 bytes and 12 bytes, respectively. Suppose the cost function of the above scheme (1) is len, where len is the record length. The cost of scheme (2) is len + 100 × p, for penalizing accessing exam results on seperate blocks. What is the minimal value of n as a function of p, so that the above scheme (2) is better than scheme (1).

Problem 4 Indexes on Sequential Files

Given a relation of 200,000 tuples. Suppose each block could hold 10 tuples or 20 key-pointer pairs. The index is built on the key field of the relation (thus no duplicate search keys) and the file is sorted according to the key. Answer the following questions:

a) How many blocks do we need for a dense index of this relation?

b) How many blocks do we need for a sparse index of this relation?

c) How many blocks do we need for a 2-level index of this relation (given that the first level is dense)?

d) Suppose we use a 3-level index whose first level is sparse. If at most 12000 blocks can be used to store the index, what is the maximal number of tuples that the relation can have?

Problem 5 B+tree

Execute the following operations on Fig. 13.23 (page 635 of the textbook). Show the detailed steps, using Example 13.23 and Example 13.24 as examples.

(a) Lookup the record with key 29.

(b) Lookup the record with key 38.

(c) Lookup all records in the range 25 to 38.

(d) Lookup all records with keys less than 25.

Problem 6 B+tree

Consider the B+ tree index of degree 2 shown in the figure below. For each problem below, you only need to show the final resulting B+ Tree.

a) Show the tree that would result from inserting a data entry with key 40 into this tree.

b) Show the B+ tree that would result from inserting a data entry with key 30 into the original tree.

c) Show the B+ tree that would result from deleting the data entry with key 45 from the original tree.

documents that contain the keyword ”database”, all the documents that contain both ”in- formation” and ”retrieval”, and so on. Indexing documents is essential in processing IR queries efficiently.

The following questions (a) and (b) require you to do some research on the textbook, liter- ature and Web.

a) What kind of index structure is often used for documents?

b) Briefly summarize how such index is utilized to index documents and why it is helpful in IR queries.

The following question require you to think and compare.

c) Why general indexing structures such as B+tree and hash table are inappropriate for indexing documents?