Hash Tables - Data Structures - Lecture Notes, Study notes of Data Structures and Algorithms

Some concept of Data Structures are Abstract, Balance Factor, Complete Binary Tree, Dynamically, Storage, Implementation, Sequential Search, Advanced Data Structures, Graph Coloring Two, Insertion Sort. Main points of this lecture are: Hash Tables, Total Add, Put Time, Total Search Time, Height of Resulting Tree, Ordered, Randomly, Items, Final Height, Seconds

Typology: Study notes

2012/2013

Uploaded on 04/30/2013

jut
jut 🇮🇳

4.5

(63)

77 documents

1 / 6

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1. BST, AVL trees, and hash tables can all be used to implement a dictionary ADT.
NANA1530139,999Height of resulting tree
0.0390.0440.0620.0790.06038.100Total search time
0.0740.0640.1950.1190.20547.785Total add/put time
Closed Addr.
(Chaining)
Open Addr.
(Quadratic)
AVL TreeBSTAVL TreeBST
Order did not matter
(Hash table sizes 2
15
= 32K)
Items added in random
order
Items added in sorted
order
Dictionary Successful Search Comparisons with 10,000 integer items (Time in seconds)
a) The puts of these 10,000 randomly ordered items into the BST took 0.119 seconds and 0.179 seconds into the
AVL tree. Why did the BST puts take less time eventhough the final height was 30 vs. a final AVL tree height of
15?
b) With a very, very poor hash function or very, very bad choice of keys all keys could hash to the same home
address.
What would be the worst-case big-oh of open-address hashing with quadratic probing?
What would be the worst-case big-oh of chaining using a linked list at each home address?
What would be the worst-case big-oh of chaining using an AVL tree at each home address?
AVL Tree containing
0
1
2
3
4
5
Hash Table
.
.
.
all "n" items in the
hash table
2. The data structures we have discussed so far are all in-memory, i.e., data is stored in main/RAM memory. Data
can also be stored on secondary storage in a file (e.g., moiveData.txt file). Currently, most secondary storage
consists of hard-disks.
a) Complete the following table comparing main/RAM memory vs. hard-disk:
Average access time
Size on a typical desktop computer
Solid-State DriveHard-disk DriveMain/RAM memoryCriteria
b) Which criterion seems to be the most important difference between the main and secondary memories?
Data Structures (810:052) Lecture 25 Name:_________________
Lecture 25 Page 1
Docsity.com
pf3
pf4
pf5

Partial preview of the text

Download Hash Tables - Data Structures - Lecture Notes and more Study notes Data Structures and Algorithms in PDF only on Docsity!

  1. BST, AVL trees, and hash tables can all be used to implement a dictionary ADT.

Height of resulting tree 9,999 13 30 15 NA NA

Total search time 38.100 0.060 0.079 0.062 0.044 0.

Total add/put time 47.785 0.205 0.119 0.195 0.064 0.

Closed Addr. (Chaining)

Open Addr. (Quadratic)

BST AVL Tree BST AVL Tree

Order did not matter (Hash table sizes 2^15 = 32K)

Items added in random order

Items added in sorted order

Dictionary Successful Search Comparisons with 10,000 integer items (Time in seconds)

a) The puts of these 10,000 randomly ordered items into the BST took 0.119 seconds and 0.179 seconds into the AVL tree. Why did the BST puts take less time eventhough the final height was 30 vs. a final AVL tree height of 15?

b) With a very, very poor hash function or very, very bad choice of keys all keys could hash to the same home address.  What would be the worst-case big-oh of open-address hashing with quadratic probing?

 What would be the worst-case big-oh of chaining using a linked list at each home address?

 What would be the worst-case big-oh of chaining using an AVL tree at each home address?

AVL Tree containing

0 1 2 3 4 5

Hash Table

. ..

all "n" items in the hash table

  1. The data structures we have discussed so far are all in-memory, i.e., data is stored in main/RAM memory. Data can also be stored on secondary storage in a file (e.g., moiveData.txt file). Currently, most secondary storage consists of hard-disks. a) Complete the following table comparing main/RAM memory vs. hard-disk:

Average access time

Size on a typical desktop computer

Criteria Main/RAM memory Hard-disk Drive Solid-State Drive

b) Which criterion seems to be the most important difference between the main and secondary memories?

0

0

0

2

2

2

1

1

1

0

0

0

1

1

1

2

2

3

3

3

4

4

4

5

5

5

6

6

6

7

7

7

Sector #

Track #

0

1

3

Surface #

2

2

S-

S-

R/W Heads

0 2 1 3 4 5 6 7 8 11 10 9 13 14 15

12

16 18 17 19

8-15 are on surface 1 (on the bottom of the disk)

Logical View of Disk as Linear Collection of Blocks

0 1 2

(track #, surface #, sector #) to

(0,0,0) (0,0,1) (0,0,2)

Linear block # mapping

All of cylinder 0 All of cylinder 1

Bits of linear block # : track # surface # sector #

  1. Disk-access time = (seek time) + (rotational delay) + (date transfer time). How is each component of the disk-access time effected by increasing the disk's RPMs (revolutions per minute)?

b) If we want fast access to a collection of sectors, where can we place them to minimize seek time and rotational delay?

  1. file descriptor blocks - list of blocks hold the address of the physical location of data blocks

File system meta-data for file 2nd data 1st data

3rd data 0th data

block in block in

block in block in

file

file

file file

file descriptor block(s)

pointer to next file descriptor block

a) What types of file access are supported efficiently?

b) How easy is it for the file to grow in size?

  1. To implement "random-access of a record by key" in a file how might we use hashing?
  2. To implement "random-access of a record by key" in a file why would an AVL tree not work well?
  1. A B+ Tree is a multi-way tree (typically in the order of 100s children per node) used primarily as a file-index structure to allow fast search (as well as insertions and deletions) for a target key on disk. Two types of pages (B+ tree "nodes") exist:  Data pages - which always appear as leaves on the same level of a B+ tree (usually a doubly-linked list too)  Index pages - the root and other interior nodes above the data page leaves. Index nodes contain some minimum and maximum number of keys and pointers bases on the B+ tree's branching factor (b) and fill factor. A 50% fill factor would be the minimum for any B+ tree. All index pages must have (^) «b/2»≤ # child ≤ b, except the root which must have at least two children.

Consider an B+ tree example with b = 5.

a) How would you find 88?

b) The insert algorithm for a B+ tree is summarized by the below table. Where would you insert 50, 100, 105, 110, 180, 200, 210?

  1. Split data page with records < middle key going in left data page and records ≥ middle key going in right data page.
  2. Adding middle key to parent index page causes it to split with keys < middle key going into the left index page, keys > middle key going in right index page, and the middle key inserted into the next higher level index page. If the next higher index page is full continue to splitting index pages up the B+ tree as necessary.

Yes Yes

  1. Split data page with records < middle key going in left data page and records ≥ middle key going in right data page.
  2. Place middle key in index page in sorted order with the pointer immediately to its left pointing to the left data page and the pointer immediately to its right pointing to the right data page.

Yes No

No No Place record in sorted position in the appropriate data page.

Parent Index Page Full?

Data Page Full?

insertion Algorithm

Situation