Database Systems: Homework #2 for CPS 216 Spring 2005, Study notes of Database Management Systems (DBMS)

This homework #2 for cps 216 spring 2005 covers various topics in database systems, including b+-tree implementation, generalized search trees (gist), extensible hashing, linear hashing, partitioned hashing, and join operations. It includes problems on inserting and deleting keys in a b+-tree, implementing gist for range queries, hash indexing using extensible hashing and linear hashing, partitioned hashing for multi-attribute search keys, and join operations such as block-based nested-loop join, sort-merge join, and hash join.

Typology: Study notes

2011/2012

Uploaded on 01/29/2012

arold
arold 🇺🇸

4.7

(24)

372 documents

1 / 3

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
CPS 216 Spring 2005
Homework #2
Assigned: Tuesday, February 15
Due: Thursday, March 3
Problem 1.
Consider the following B+-tree with a maximum fan-out of 4.
50
20
60 70 8 0
5 10
20 45
50 51 5 5
60 65
70 71 7 5
80 85
For all questions below, assume that you always start with the tree shown above—not the
result tree you get for the previous question.
(a) Show the result tree after inserting 49.
(b) Show the result tree after deleting 50.
(c) Show the result tree after inserting 72.
(d) Show the result tree after deleting 5.
(e) What is the minimum number of keys you must delete for this tree to shrink down to
two levels? Show the sequence of deletions.
Problem 2.
The following question is based on the paper “Generalized Search Trees for Database
Systems,” by Hellerstein et al. Suppose we want to use GiST to index ranges of the form [x, y),
where x and y are integers. These ranges may overlap with each other. The queries are of the
form “find all ranges that overlaps with [a, b).” Discuss briefly how you would implement the
six basic methods required by GiST, and whether you should set IsOrdered to true and define
Compare. Highlight the differences between your implementation and the B+-tree
implementation described in the paper.
Problem 3.
Suppose keys are hashed to 4-bit sequences, and each block can hold three records. We start
with a hash table with two empty blocks (corresponding to 0 and 1), and insert 16 records
with keys 0000, 0001, 0010, …, 1111, in order. Show the final state of the index:
(a) If the index is based on extensible hashing.
(b) If the index is based on linear hashing, with a capacity threshold of 100%. Here, we
define capacity to be (actual number of records indexed) / (maximum number of
records that can be held by primary buckets).
pf3

Partial preview of the text

Download Database Systems: Homework #2 for CPS 216 Spring 2005 and more Study notes Database Management Systems (DBMS) in PDF only on Docsity!

CPS 216 Spring 2005 Homework # Assigned: Tuesday, February 15 Due: Thursday, March 3

Problem 1.

Consider the following B+-tree with a maximum fan-out of 4.

50

20 60 70 8 0

5 10 20 45 50 51 5 5 60 65 70 71 7 5 80 85

For all questions below, assume that you always start with the tree shown above— not the result tree you get for the previous question.

(a) Show the result tree after inserting 49. (b) Show the result tree after deleting 50. (c) Show the result tree after inserting 72. (d) Show the result tree after deleting 5. (e) What is the minimum number of keys you must delete for this tree to shrink down to two levels? Show the sequence of deletions.

Problem 2.

The following question is based on the paper “Generalized Search Trees for Database Systems,” by Hellerstein et al. Suppose we want to use GiST to index ranges of the form [ x , y ), where x and y are integers. These ranges may overlap with each other. The queries are of the form “find all ranges that overlaps with [ a , b ).” Discuss briefly how you would implement the six basic methods required by GiST, and whether you should set IsOrdered to true and define Compare. Highlight the differences between your implementation and the B+-tree implementation described in the paper.

Problem 3.

Suppose keys are hashed to 4-bit sequences, and each block can hold three records. We start with a hash table with two empty blocks (corresponding to 0 and 1), and insert 16 records with keys 0000, 0001, 0010, …, 1111, in order. Show the final state of the index:

(a) If the index is based on extensible hashing. (b) If the index is based on linear hashing, with a capacity threshold of 100%. Here, we define capacity to be (actual number of records indexed) / (maximum number of records that can be held by primary buckets).

Problem 4.

To build a hash index for a multi-attribute search key, we can use an approach called partitioned hashing. The partitioned hash function is really a list of hash functions, one for each attribute in the search key. Suppose that we wish to build a partitioned hash index on R ( A , B ) with 2 n^ buckets numbered 0 to 2 n^ – 1. In this case, the partitioned hash function consists of two hash functions hA and hB. Hash function hA takes a value of A as input and produces a result with n (^) A bits, and hB takes a value of B as input and produces a result with ( nn (^) A ) bits. The two results are concatenated together to produce the result of the partitioned hash function, which is then used to index the buckets. To locate records with A = a and B = b , we simply go directly to the bucket numbered hA ( a ) hB ( b ) (in binary).

(a) Which buckets do we have to examine in order to locate all records with A = a? (b) Suppose we are given a query mix. Each query in this mix will either ask for records with a given value of A , or it will ask for records with a given value of B (but never both). With probability p , the value of A will be specified. Give a formula in terms of n , n (^) A , and p for the expected number of buckets that must be examined to answer a random query from the mix. (c) Find the value of n (^) A , as a function of n and p , that minimizes the expected number of buckets examined per query.

Problem 5.

Consider the join R  (^) R. A = S. B S , given the following information about the tables to be joined. The cost metric is the number of disk I/O’s and the cost of writing out the result should be uniformly ignored.

  • R contains 10,000 rows and has 10 rows per page.
  • S contains 2,000 rows and also has 10 rows per page.
  • S. B is a key of S.
  • Both tables are stored compactly on disk in no particular order.
  • No indexes are available.
  • 52 memory blocks are available for query processing.

(a) What is the expected cost of joining R and S using a block-based nested-loop join, with R as the outer table? (b) What is the expected cost of joining R and S using a block-based nested-loop join, with S as the outer table? (c) What is the expected cost of joining R and S using a sort-merge join? What is the minimum number of memory blocks required for this cost to remain unchanged? (d) What is the expected cost of joining R and S using a hash join? What is the minimum number of memory blocks required for this cost to remain unchanged?