Data Structures: Hash Tables and Skip Lists, Slides of Computer Science

An overview of hash tables and skip lists, two essential data structures used in computer science. It covers the basic ideas, operations, and performance analysis of these structures. Students can use this document as study notes, summaries, or as a reference for assignments or exams.

Typology: Slides

2012/2013

Uploaded on 03/23/2013

dhruv
dhruv 🇮🇳

4.3

(12)

194 documents

1 / 26

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Algorithms
Hash Tables
Docsity.com
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a

Partial preview of the text

Download Data Structures: Hash Tables and Skip Lists and more Slides Computer Science in PDF only on Docsity!

Algorithms

Hash Tables

Homework 4

● Programming assignment:

■ Implement Skip Lists

■ Evaluate performance

■ Extra credit: compare performance to randomly

built BST

■ Due Mar 9 (Fri before Spring Break)

■ Will post later today

Summary: Skip Lists

● O(1) expected time for most operations

● O(lg n) expected time for insert

● O(n^2 ) time worst case

■ But random, so no particular order of insertion

evokes worst-case behavior

● O(n) expected storage requirements

● Easy to code

Review: Hash Tables

● Hash table:

■ Given a table T and a record x, with key (=

symbol) and satellite data, we need to support:

○ Insert (T, x)
○ Delete (T, x)
○ Search(T, x)

■ We want these to be fast, but don’t care about

sorting the records

■ In this discussion we consider all keys to be

(possibly large) natural numbers

Review: The Problem With

Direct Addressing

● Direct addressing works well when the range

m of keys is relatively small

● But what if the keys are 32-bit integers?

■ Problem 1: direct-address table will have

2 32 entries, more than 4 billion

■ Problem 2: even if memory is not an issue, the

time to initialize the elements to NULL may be

● Solution: map keys to smaller range 0..m-

● This mapping is called a hash function

Hash Functions

● Next problem: collision

T

m - 1

h(k 1 ) h(k 4 )

h(k 2 ) = h(k 5 )

h(k 3 )

k (^4)

k 2 k (^3)

k (^1)

k (^5)

U

(universe of keys)

K

(actual keys)

Open Addressing

● Basic idea (details in Section 12.4):

■ To insert: if slot is full, try another slot, …, until an

open slot is found (probing)

■ To search, follow same sequence of probes as

would be used when inserting the element

○ If reach element with correct key, return it
○ If reach a NULL pointer, element is not in table

● Good for fixed sets (adding but no deletion)

■ Example: spell checking

● Table needn’t be much bigger than n

Chaining

● Chaining puts elements that hash to the same

slot in a linked list:

T

k 4

k 2 k^3

k (^1)

k (^5)

U

(universe of keys)

K

(actual keys)

k (^6) k (^8)

k (^7)

k 1 k 4 ——

k 5 k 2

k 3 k 8 k 6 ——

k 7 ——

Chaining

T

k 4

k 2 k^3

k (^1)

k (^5)

U

(universe of keys)

K

(actual keys)

k (^6) k (^8)

k (^7)

k 1 k 4 ——

k 5 k 2

k 3 k 8 k 6 ——

k 7 ——

● How do we delete an element?

■ Do we need a doubly-linked list for efficient delete?

Chaining

● How do we search for a element with a

given key?

T

k 4

k 2 k^3

k (^1)

k (^5)

U

(universe of keys)

K

(actual keys)

k (^6) k (^8)

k (^7)

k 1 k 4 ——

k 5 k 2

k 3 k 8 k 6 ——

k 7 ——

Analysis of Chaining

● Assume simple uniform hashing: each key in

table is equally likely to be hashed to any slot

● Given n keys and m slots in the table, the

load factor α = n/m = average # keys per slot

● What will be the average cost of an

unsuccessful search for a key? A: O(1+ α)

Analysis of Chaining

● Assume simple uniform hashing: each key in

table is equally likely to be hashed to any slot

● Given n keys and m slots in the table, the

load factor α = n/m = average # keys per slot

● What will be the average cost of an

unsuccessful search for a key? A: O(1+ α)

● What will be the average cost of a successful

search?

Analysis of Chaining Continued

● So the cost of searching = O(1 + α)

● If the number of keys n is proportional to the

number of slots in the table, what is α?

● A: α = O(1)

■ In other words, we can make the expected cost of

searching constant if we make α constant

Choosing A Hash Function

● Clearly choosing the hash function well is

crucial

■ What will a worst-case hash function do?

■ What will be the time to search in this case?

● What are desirable features of the hash

function?

■ Should distribute keys uniformly into slots

■ Should not depend on patterns in the data