Randomized Algorithms, Exercises Solution- Discrete Mathematics 4, Exercises of Discrete Structures and Graph Theory

Discrete Structures, Randomized Algorithm, Exercises, Exam Paper

Typology: Exercises

2010/2011

Uploaded on 10/12/2011

lovefool
lovefool 🇬🇧

4.5

(21)

292 documents

1 / 3

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
6.856 Randomized Algorithms
Handout #11, March 25, 2011 Homework 4 Solutions
1.
The key difference is that if βinbins have height hthen the probability a ball chooses all
height hbins drops to βd
i. Thus, the “expected number” of height h+ 1 bins is like βi+1n
where βi+1 =βd
i. This gives βi= (1
4)diwhich becomes O(1/n) at i=O(logdlog n).
The rest of the proof is unchanged; in order to deal with the conditioning we work with
parameters βi+1 = (2βi)d.
2.
Let h() be an arbitrary hash function mapping elements in M=x1, . . . , xmto elements in
N=y1, . . . , yn. Let hibe the number of elements in Mmapped to yi. Thus, the number
of subsets of size nin Mperfected hashed by h() is H=Qihi. Subject to the constraint
that Pihi=m,Hachieves the maximum of m
nnwhen h() evenly divides the elements
or hi=m
n. Since there are m
nunique subsets of size nin Mand each hash function can
perfectly hash at most m
nnsubsets, we need a hash family of size at least m
n/m
nnin
order to find a perfect hash function for each possible subset. Thus, if 2nm2o(n), the
size of a perfect hash family required is bounded by:
m
n
m
nn=
m!
n!(mn)!
m
nn2πm m
em
2πn n
enp2π(mn)mn
emnm
nn=1
2πn m
mnmn+1
2
>1
2πn 1 + n
mnmn
=1
2πn "1 + n
mn
mn
n#n
1
2πn "e1 + n
mn
1
2#n
1
2πn e
2n
= 2θ(n)
We have demonstrated that the required size of a perfect hash family is at least exponential
in n. If m2o(n), any polynomial in mis also 2o(n)since mc2c·o(n)= 2o(n)for any
constant c. Thus, a hash family of size only polynomial in mis not large enough be be
1
pf3

Partial preview of the text

Download Randomized Algorithms, Exercises Solution- Discrete Mathematics 4 and more Exercises Discrete Structures and Graph Theory in PDF only on Docsity!

6.856 — Randomized Algorithms

Handout #11, March 25, 2011 — Homework 4 Solutions

The key difference is that if βin bins have height h then the probability a ball chooses all height h bins drops to βid. Thus, the “expected number” of height h + 1 bins is like βi+1n where βi+1 = βid. This gives βi = (^14 )d

i which becomes O(1/n) at i = O(logd log n). The rest of the proof is unchanged; in order to deal with the conditioning we work with parameters βi+1 = (2βi)d.

Let h() be an arbitrary hash function mapping elements in M = x 1 ,... , xm to elements in N = y 1 ,... , yn. Let hi be the number of elements in M mapped to yi. Thus, the number of subsets of size n in M perfected hashed by h() is H =

i hi. Subject to the constraint that

i hi^ =^ m,^ H^ achieves the maximum of^

(m n

)n when h() evenly divides the elements or hi = mn. Since there are

(m n

unique subsets of size n in M and each hash function can

perfectly hash at most

(m n

)n subsets, we need a hash family of size at least

(m n

(m n

)n in

order to find a perfect hash function for each possible subset. Thus, if 2n ≤ m ≤ 2 o(n), the size of a perfect hash family required is bounded by:

(m n

(m n

)n =

m! n (!(m−n)! m n

)n ≈

2 πm

(m e

)m √ 2 πn

(n e

)n √ 2 π(m − n)

(m−n e

)m−n (m n

)n =^

2 πn

m m − n

)m−n+ (^12)

2 πn

n m − n

)m−n

2 πn

[(

n m − n

)m−nn^ ]n

2 πn

[

e

n m − n

)− 12 ]n ≥

2 πn

[

e √ 2

]n = 2θ(n)

We have demonstrated that the required size of a perfect hash family is at least exponential in n. If m ≤ 2 o(n), any polynomial in m is also ≤ 2 o(n)^ since mc^ ≤ 2 c·o(n)^ = 2o(n)^ for any constant c. Thus, a hash family of size only polynomial in m is not large enough be be

exponential in n. Therefore, there is no perfect hash family mapping from m to n of size polynomial in m.

As hinted, we will use a main table and an overflow table. After k probes of the main table, if we have not found an empty cell, we place the item in the overflow table. If you have (1+ε)n space, you can build a main table of size n and a cuckoo-hash table of size εn. By the argument in class, the cuckoo hash table can hold εn/2 items with constant worst-case lookup time. We’re going to guarantee that the “main” table holds only (1−ε/2)n items by the simple rule that if the main table gets that full, we immediately place other incoming items in the cuckoo hash table. Assuming the main table has the claimed limit, k probes to it will fail to find an empty bucket with probability (1 − ε/2)k. If we arrange for (say) (1 − ε/2)k^ ≤ ε/4, then the probability that an item fails to find an empty space is ε/4, so the expected number of items that fail to find a space, and get kicked into overlow, is εn/4. Thus a chernoff bound tells us that it is at most εn/2 with (exponentially) high probability, so the cuckoo table will never get too full to operate in constant time (w.h.p.). Solving, we find that k = log(ε/4)/ log(1 − ε/2) = O(^1 ε log(^1 ε )).

To achieve evaluation time of O(1) in expectation and O(log log m) with high probability, we will modify the consistent hashing algorithm as follows. First, break the ring into m equal sized intervals. Next, associate with each interval the buckets that overlap the interval. Note that the number of buckets associated with each interval is at most 1 more than the number of bucket boundaries that fall within the interval. To find the bucket associated with a particular item, use the hash function to map the item to a number between [0, 1] and find the bucket responsible for the item among the buckets associated with the interval that the item falls in. Thus, the performance of this lookup is dependent on the number of buckets associated with the interval. Specifically, if we pre-order the buckets within each interval, we can find the bucket responsible for a given item using binary search in O(log b) time where b is the number of buckets in the interval. Since the bucket boundary positions are randomly selected, the problem of finding the expected and maximum number of bucket boundary positions within each interval reduces to the m balls in m bins problem. Thus, we can conclude that the expected number of boundary positions in each interval is O(1) and the maximum number of boundary positions

in any interval is O

log m log log m

with high probability. With at most 1 + 1 = 2 buckets in

each interval in expectation, it takes 1 comparison, or O(1) time, to determine which of the 2 buckets an item maps to. We maintain pointers in each empty interval to the next non-empty interval to allow fast(O(1)) search. These pointers can be maintained when