Adavanced Data Structures, Exams of Advanced Data Analysis

Students who are in 2nd CSE jntuk

Typology: Exams

2019/2020

Uploaded on 04/16/2020

rameshpics
rameshpics 🇮🇳

5

(1)

3 documents

1 / 17

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
RAGHU INSTITUTE OF TECHNOLOGY
AUTONOMOUS
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
II BTECH II SEM
Advanced Data Structures
Unit-2
Prepared By
Dr.V.Sangeetha
&
Mr. N.UdayKumar
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff

Partial preview of the text

Download Adavanced Data Structures and more Exams Advanced Data Analysis in PDF only on Docsity!

RAGHU INSTITUTE OF TECHNOLOGY AUTONOMOUS DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

II BTECH II SEM Advanced Data Structures Unit- Prepared By Dr.V.Sangeetha & Mr. N.UdayKumar

• HASHING

  • (^) Static Hashing- Hash Table- Hash Functions
  • (^) Secure Hash Function- Overflow
  • (^) Handling- Theoretical Evaluation of Overflow Techniques

Dynamic Hashing

  • (^) Dynamic Hashing Using Directories

Directory less Dynamic Hashing

References :

  • (^) Data Structures, a Pseudocode Approach, Richard F Gilberg, Behrouz

A Forouzan, Cengage.

  • (^) Fundamentals of DATA STRUCTURES in C: 2nd ed, , Horowitz , Sahani,

Andersonfreed

Contents:

2

Symbol Table

Definition

A set of name-attribute pairs

Operations

Determine if a particular name is in the table

Retrieve the attributes of the name

Modify the attributes of that name

Insert a new name and its attributes

Delete a name and its attributes

4

Search vs. Hashing

Search tree methods: key comparisons

hashing methods: hash functions

types

statistic hashing

dynamic hashing

Static Hashing is another form of the hashing problem

which allows users to perform lookups on a finalized

dictionary set (all objects in the dictionary are final and not

changing).

Example

7

HASH FUNCTIONS:

The idea of hashing is to distribute the entries (key/value pairs) across an array

of buckets. Given a key, the algorithm computes an index that suggests where the

entry can be found:

index = f(key, array_size)

Often this is done in two steps:

hash = hashfunc(key) index = hash % array_size

In this method, the hash is independent of the array size, and it is then reduced to

an index (a number between 0 and array_size − 1) using the modulo operator (%).

In the case that the array size is a power of two, the remainder operation is

reduced to masking, which improves speed, but can increase problems with a

poor hash function.

*A good hash function and implementation algorithm are essential for good hash

table performance, but may be difficult to achieve.

A basic requirement is that the function should provide a uniform distribution of

hash values. A non-uniform distribution increases the number of collisions and the

cost of resolving them.

8

One common method of determining a hash key is the division

method of hashing.

The formula that will be used is:

hash key = key % number of slots in the table.

The division method is generally a reasonable strategy,unless the key

happens to have some undesirable properties.

The number of slots should not be :

a power of 2, since if m = 2

p

, then h ( k ) is just the p lowest order

bits of k

a power of 10, since then the hash function does not depend on

all the decimal digits of k

p

    1. If k is a character string interpreted in radix 2

p

, two strings

that are identical except for a transposition of two adjacent

characters will hash to the same value.

Division method:

10

Folding

Partition the identifier x into several parts

All parts except for the last one have the same length

Add the parts together to obtain the hash address

Two possibilities

Shift folding

x1=123, x2=203, x3=241, x4=112, x5=20,

address=

Folding at the boundaries

x1=123, x2=203, x3=241, x4=112, x5=20,

address=

11

Digital Analysis

All the identifiers are known in advance

M=1~

X

1

d

11

d

12

… d

1n

X

2

d

21

d

22

… d

2n

X

m

d

m

d

m

… d

mn

Select 3 digits from n

Criterion:

Delete the digits having the most skewed

distributions

13

Dynamic hashing

 dynamically increasing and decreasing file

size

 concepts

 file: a collection of records

 record: a key + data, stored in pages

(buckets)

 space utilization

14

NumberOfPages PageCapaci ty

NumberOf cord

Re

A Bloom filter is a space-efficient probabilistic data structure,

conceived by Burton Howard Bloom in 1970, that is used to

test whether an element is a member of a set.

False positive matches are possible, but false negatives

are not – in other words, a query returns either "possibly in

set" or "definitely not in set". Elements can be added to the

set, but not removed the more elements that are added to

the set, the larger the probability of false positives.

Use: Bloom proposed the technique for applications where

the amount of source data would require an impractically

large amount of memory if "conventional" error-free

hashing techniques were applied.

Bloom Filters

THANK YOU

17