Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Database Systems: Homework #2 for CPS 216 Spring 2005, Study notes of Database Management Systems (DBMS)

Duke University Database Management Systems (DBMS)

This homework #2 for cps 216 spring 2005 covers various topics in database systems, including b+-tree implementation, generalized search trees (gist), extensible hashing, linear hashing, partitioned hashing, and join operations. It includes problems on inserting and deleting keys in a b+-tree, implementing gist for range queries, hash indexing using extensible hashing and linear hashing, partitioned hashing for multi-attribute search keys, and join operations such as block-based nested-loop join, sort-merge join, and hash join.

Typology: Study notes

2011/2012

Uploaded on 01/29/2012

arold 🇺🇸

4.7

(24)

372 documents

1 / 3

This page cannot be seen from the preview

Don't miss anything!

CPS 216 Spring 2005

Homework #2

Assigned: Tuesday, February 15

Due: Thursday, March 3

Problem 1.

Consider the following B+-tree with a maximum fan-out of 4.

50

20

60 70 8 0

5 10

20 45

50 51 5 5

60 65

70 71 7 5

80 85

For all questions below, assume that you always start with the tree shown above—not the

result tree you get for the previous question.

(a) Show the result tree after inserting 49.

(b) Show the result tree after deleting 50.

(c) Show the result tree after inserting 72.

(d) Show the result tree after deleting 5.

(e) What is the minimum number of keys you must delete for this tree to shrink down to

two levels? Show the sequence of deletions.

Problem 2.

The following question is based on the paper “Generalized Search Trees for Database

Systems,” by Hellerstein et al. Suppose we want to use GiST to index ranges of the form [x, y),

where x and y are integers. These ranges may overlap with each other. The queries are of the

form “find all ranges that overlaps with [a, b).” Discuss briefly how you would implement the

six basic methods required by GiST, and whether you should set IsOrdered to true and define

Compare. Highlight the differences between your implementation and the B+-tree

implementation described in the paper.

Problem 3.

Suppose keys are hashed to 4-bit sequences, and each block can hold three records. We start

with a hash table with two empty blocks (corresponding to 0 and 1), and insert 16 records

with keys 0000, 0001, 0010, …, 1111, in order. Show the final state of the index:

(a) If the index is based on extensible hashing.

(b) If the index is based on linear hashing, with a capacity threshold of 100%. Here, we

define capacity to be (actual number of records indexed) / (maximum number of

records that can be held by primary buckets).

Discover Study notes of Database Management Systems (DBMS) Duke University

Partial preview of the text

Download Database Systems: Homework #2 for CPS 216 Spring 2005 and more Study notes Database Management Systems (DBMS) in PDF only on Docsity!

CPS 216 Spring 2005 Homework # Assigned: Tuesday, February 15 Due: Thursday, March 3

Problem 1.

Consider the following B+-tree with a maximum fan-out of 4.

50

20 60 70 8 0

5 10 20 45 50 51 5 5 60 65 70 71 7 5 80 85

For all questions below, assume that you always start with the tree shown above— not the result tree you get for the previous question.

(a) Show the result tree after inserting 49. (b) Show the result tree after deleting 50. (c) Show the result tree after inserting 72. (d) Show the result tree after deleting 5. (e) What is the minimum number of keys you must delete for this tree to shrink down to two levels? Show the sequence of deletions.

Problem 2.

The following question is based on the paper “Generalized Search Trees for Database Systems,” by Hellerstein et al. Suppose we want to use GiST to index ranges of the form [ x , y ), where x and y are integers. These ranges may overlap with each other. The queries are of the form “find all ranges that overlaps with [ a , b ).” Discuss briefly how you would implement the six basic methods required by GiST, and whether you should set IsOrdered to true and define Compare. Highlight the differences between your implementation and the B+-tree implementation described in the paper.

Problem 3.

Suppose keys are hashed to 4-bit sequences, and each block can hold three records. We start with a hash table with two empty blocks (corresponding to 0 and 1), and insert 16 records with keys 0000, 0001, 0010, …, 1111, in order. Show the final state of the index:

(a) If the index is based on extensible hashing. (b) If the index is based on linear hashing, with a capacity threshold of 100%. Here, we define capacity to be (actual number of records indexed) / (maximum number of records that can be held by primary buckets).

Problem 4.

To build a hash index for a multi-attribute search key, we can use an approach called partitioned hashing. The partitioned hash function is really a list of hash functions, one for each attribute in the search key. Suppose that we wish to build a partitioned hash index on R ( A , B ) with 2 n^ buckets numbered 0 to 2 n^ – 1. In this case, the partitioned hash function consists of two hash functions hA and hB. Hash function hA takes a value of A as input and produces a result with n (^) A bits, and hB takes a value of B as input and produces a result with ( n – n (^) A ) bits. The two results are concatenated together to produce the result of the partitioned hash function, which is then used to index the buckets. To locate records with A = a and B = b , we simply go directly to the bucket numbered hA ( a ) hB ( b ) (in binary).

(a) Which buckets do we have to examine in order to locate all records with A = a? (b) Suppose we are given a query mix. Each query in this mix will either ask for records with a given value of A , or it will ask for records with a given value of B (but never both). With probability p , the value of A will be specified. Give a formula in terms of n , n (^) A , and p for the expected number of buckets that must be examined to answer a random query from the mix. (c) Find the value of n (^) A , as a function of n and p , that minimizes the expected number of buckets examined per query.

Problem 5.

Consider the join R (^) R. A = S. B S , given the following information about the tables to be joined. The cost metric is the number of disk I/O’s and the cost of writing out the result should be uniformly ignored.

R contains 10,000 rows and has 10 rows per page.
S contains 2,000 rows and also has 10 rows per page.
S. B is a key of S.
Both tables are stored compactly on disk in no particular order.
No indexes are available.
52 memory blocks are available for query processing.

(a) What is the expected cost of joining R and S using a block-based nested-loop join, with R as the outer table? (b) What is the expected cost of joining R and S using a block-based nested-loop join, with S as the outer table? (c) What is the expected cost of joining R and S using a sort-merge join? What is the minimum number of memory blocks required for this cost to remain unchanged? (d) What is the expected cost of joining R and S using a hash join? What is the minimum number of memory blocks required for this cost to remain unchanged?

Database Systems: Homework #2 for CPS 216 Spring 2005, Study notes of Database Management Systems (DBMS)

Related documents

Partial preview of the text

Download Database Systems: Homework #2 for CPS 216 Spring 2005 and more Study notes Database Management Systems (DBMS) in PDF only on Docsity!