Advanced Database Systems-Lecture 09 Slides-Computer Science, Slides of Database Management Systems (DBMS)

This course covers advanced database management system design principles and techniques. Indexing, Advanced Database Systems, Static Hashing, Extensible Hashing , Pros, Cons, Linear Hashing, Hashing Versus B-trees, Rule of Thumb, Split, Handles Growing Files, No full Reorganization

Typology: Slides

2011/2012

Uploaded on 01/28/2012

arold
arold 🇺🇸

4.7

(24)

372 documents

1 / 6

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1
Indexing: Part III
CPS 216
Advanced Database Systems
2
Announcements (February 15)
Homework #1 graded
Verify your grades on Blackboard
Homework #2 assigned today
Due in 2½ weeks
Reading assignments for this and next week
“The” query processing survey by Graefe
Due next Wednesday
Midterm and course project proposal in 3½ weeks
3
Static hashing
What if a bucket is full?
key bucket
number
hash
function
h
bucket 0
bucket 1
bucket i
bucket
N-1
ki1
ki2
ki3
bucket i
h(k) = i
With records or
record pointers
bucket i
overflow
bucket i
overflow
Does it make sense to use a hash-based index
as a sparse index on a sorted table?
pf3
pf4
pf5

Partial preview of the text

Download Advanced Database Systems-Lecture 09 Slides-Computer Science and more Slides Database Management Systems (DBMS) in PDF only on Docsity!

Indexing: Part III

CPS 216

Advanced Database Systems

2

Announcements (February 15)

™ Homework #1 graded

ƒ Verify your grades on Blackboard

™ Homework #2 assigned today

ƒ Due in 2½ weeks

™ Reading assignments for this and next week

ƒ “The” query processing survey by Graefe

ƒ Due next Wednesday

™ Midterm and course project proposal in 3½ weeks

3

Static hashing

What if a bucket is full?

key bucket number

hash function h

bucket 0

bucket 1

bucket i

bucket N -

ki 1 ki 2 ki 3 …

bucket i

h ( k ) = i

With records or record pointers

bucket i overflow

bucket i overflow

Does it make sense to use a hash-based index as a sparse index on a sorted table?

Performance of static hashing

™ Depends on the quality of the hash function!

ƒ Best (hopefully average) case: one I/O!

ƒ Worst case: all keys hashed into one bucket!

ƒ See Knuth vol. 3 for good hash functions

™ Rule of thumb: keep utilization at 50%-80%

™ How do we cope with growth?

ƒ Extensible hashing

ƒ Linear hashing

5

Extensible hashing ( TODS 1979)

™ Idea 1: use i bits of output by hash function and

dynamically increase i as needed

™ Problem: ++ i = double the number of buckets!

™ Idea 2: use a directory

ƒ Just double the directory size

ƒ Many directory entries can point to the same bucket

ƒ Only split overflowed buckets

“One more level of indirection solves everything!”

i

h ( k ) 0 1 1 0 1 0 1 1

6

Extensible hashing example (slide 1)

™ Insert k with h ( k ) = 0101

™ Bucket too full?

ƒ ++local depth, split bucket, and ++global

depth (double the directory size) if necessary

ƒ Allowing some overflow is fine too

Directory Buckets

Local depth

Global depth (always the max of local depths)

Summary of extensible hashing

™ Pros

ƒ Handles growing files

ƒ No full reorganization

™ Cons

11

Linear hashing ( VLDB 1980)

™ Grow only when utilization exceeds a given

threshold

™ No extra indirection

ƒ Some extra math to figure out the right bucket

Insert 0101 Threshold exceeded; grow! 0000 1010

i = 1 Number of bits in use = d log 2 n e n = 2 Number of primary buckets

12

Linear hashing example (slide 2)

™ Grows linearly (hence the name)

™ Always split the ( n – 2blog^2 n c)-th bucket (0-based index)

ƒ Intuitively, the first bucket with the lowest depth ƒ Not necessarily the bucket being inserted into!

Insert 0001

Insert 1100

Threshold exceeded; grow! 0000 1111 0101

i = 2 n = 3

Linear hashing example (slide 3)

i = 2 n = 4

Insert 1110 Threshold exceeded; grow!

14

Linear hashing example (slide 4)

™ Look up 1110

ƒ Bucket 110 (6-th bucket) is not here

ƒ Then look in the (6 – 2blog^2 n c)-th bucket (= 2nd)

i = 3 n = 5

15

Summary of linear hashing

™ Pros

ƒ Handles growing files ƒ No full reorganization

™ Cons