Standard ADTs - Object Oriented Programming and Data Structures - Lecture Slides, Slides of Object Oriented Programming

Lecture from Object Oriented Programming and Data Structures course with following key points: Standard Adts, Abstract Data Types, Priority Queues, Dictionaries, Data Structure, Array Implementation of Stack, Queue Implementations, Hashing, Table Doubling, Universal Hashing

Typology: Slides

2013/2014

Uploaded on 01/29/2014

sundar
sundar 🇮🇳

4.7

(9)

104 documents

1 / 33

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Standard ADTs
docsity.com
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21

Partial preview of the text

Download Standard ADTs - Object Oriented Programming and Data Structures - Lecture Slides and more Slides Object Oriented Programming in PDF only on Docsity!

Standard ADTs

Abstract Data Types (ADTs)

 A method for achieving

abstraction for data

structures and algorithms

 ADT = model + operations

 Describes what each

operation does, but not how

it does it

 An ADT is independent of its

implementation

 In Java, an interface corresponds

well to an ADT

 The interface describes the operations, but says nothing at all about how they are implemented

 Example: Stack interface/ADT

public interface Stack { public void push(Object x); public Object pop(); public Object peek(); public boolean isEmpty(); public void clear(); }

Sets

 ADT Set

 Operations: void insert(Object element); boolean contains(Object element); void remove(Object element); boolean isEmpty(); void clear(); for(Object o: mySet) { ... }

 Where used:

 Wide use within other algorithms

 Note: no duplicates allowed

 A “set” with duplicates is sometimes called a multiset or bag

A set makes no promises about ordering, but you can still iterate over it.

Dictionaries

 ADT Dictionary (aka Map)

 Operations:  void insert(Object key, Object value);void update(Object key, Object value);Object find(Object key);void remove(Object key);boolean isEmpty();void clear();

 Think of: key = word; value = definition

 Where used:

 Symbol tables  Wide use within other algorithms

A HashMap is a particular implementation of the Map interface

From interface to implementation

  • Given that we want to support some interface, the designer still faces

a choice

  • What will be the best way to implement this interface for my expected type of use?
  • Choice of implementation can reflect many considerations
  • Major factors we think about
  • Speed for typical use case
  • Storage space required

Array Implementation of Stack

class ArrayStack implements Stack { private Object[] array; //Array that holds the Stack private int index = 0; //First empty slot in Stack public ArrayStack(int maxSize) { array = new Object[maxSize]; } public void push(Object x) { array[index++] = x; } public Object pop() { return array[--index]; } public Object peek() { return array[index-1]; } public boolean isEmpty() { return index == 0; } public void clear() { index = 0; } } max- 1 3 2 1 0 4 index O(1) worst- case time for each operation Question: What can go wrong?

…. What if maxSize is too small?

Queue Implementations

  • Possible implementations (^)  Recall: operations are add, poll, peek,…  For linked-list  All operations are O(1)  For array with head at A[0]  poll takes time O(n)  Other ops are O(1)  Can overflow  For array with wraparound  All operations are O(1)  Can overflow Linked List head (^) last Array with wraparound (can overflow) head last Array with head always at A[0] (poll( ) becomes expensive) (can overflow) last

A Queue From 2 Stacks

  • Add pushes onto stack A
  • Poll pops from stack B
  • If B is empty, move all elements from stack A to stack B
  • Some individual operations are costly, but still O(1) time per

operations over the long run

Goal: Design a Dictionary (aka Map )

  • Operations
    • void insert(key, value)
    • void update(key, value)
    • Object find(key)
    • void remove(key)
    • boolean isEmpty()
    • void clear()

Array implementation: Using an

array of (key,value) pairs

Unsorted Sorted

insert O(1) O(n)

update O(n) O(log n)

find O(n) O(log n)

remove O(n) O(n)

n is the number of items currently held in the dictionary

Hashing

  • Idea: compute an array index via a hash function h
  • U is the universe of keys
  • h: U → [0,…,m-1] where m = hash table size
  • Usually |U| is much bigger than m, so collisions are possible (two elements with the same hash code)
  • h should
    • be easy to compute
    • avoid collisions
    • have roughly equal probability for each table position

Typical situation:

U = all legal identifiers

Typical hash function:

h converts each letter to a number, then compute a function of these numbers Best hash functions are highly random This is connected to cryptography We’ll return to this in a few minutes

Analysis for Hashing with Chaining

  • Analyzed in terms of load factor λ = n/m = (items in table)/(table size)
  • We count the expected number of probes (key comparisons)
  • Goal: Determine expected number of probes for an unsuccessful search

 Expected number of probes for

an unsuccessful search =

average number of items per

table position = n/m = λ

 Expected number of probes for

a successful search = 1 + λ =

O(λ)

 Worst case is O(n)

Table Doubling

  • We know each operation takes time O(λ) where λ λ =n/m
  • So it gets worse as n gets large relative to m
  • Table Doubling:
  • Set a bound for λ (call it λ 0 )
  • Whenever λ reaches this bound:
    • Create a new table twice as big
    • Then rehash all the data
  • As before, operations usually take time O(1)
    • But sometimes we copy the whole table

Analysis of Table Doubling, Cont’d

  • Total number of insert operations needed to reach current table = copying work + initial insertions of items = 2n + n = 3n inserts
  • Each insert takes expected time O(λ 0 ) or O(1), so total expected time to build entire table is O(n)
  • Thus, expected time per operation is O(1)

 Disadvantages of table

doubling:

 Worst-case insertion time of O(n) is definitely achieved (but rarely)  Thus, not appropriate for time critical operations

Concept: “hash” codes

  • Definition: a hash code is the output of a function that takes some

input and maps it to a pseudo-random number (a hash )

  • Input could be a big object like a string or an Animal or some other complex thing
  • Same input always gives same out
  • Idea is that hashCode for distinct objects will have a very low likelihood of collisions
  • Used to create index data structures for finding an object given its

hash code