

Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Discrete Structures, Randomized Algorithm, Exercises, Exam Paper
Typology: Exercises
1 / 3
This page cannot be seen from the preview
Don't miss anything!


The key difference is that if βin bins have height h then the probability a ball chooses all height h bins drops to βid. Thus, the “expected number” of height h + 1 bins is like βi+1n where βi+1 = βid. This gives βi = (^14 )d
i which becomes O(1/n) at i = O(logd log n). The rest of the proof is unchanged; in order to deal with the conditioning we work with parameters βi+1 = (2βi)d.
Let h() be an arbitrary hash function mapping elements in M = x 1 ,... , xm to elements in N = y 1 ,... , yn. Let hi be the number of elements in M mapped to yi. Thus, the number of subsets of size n in M perfected hashed by h() is H =
i hi. Subject to the constraint that
i hi^ =^ m,^ H^ achieves the maximum of^
(m n
)n when h() evenly divides the elements or hi = mn. Since there are
(m n
unique subsets of size n in M and each hash function can
perfectly hash at most
(m n
)n subsets, we need a hash family of size at least
(m n
(m n
)n in
order to find a perfect hash function for each possible subset. Thus, if 2n ≤ m ≤ 2 o(n), the size of a perfect hash family required is bounded by:
(m n
(m n
)n =
m! n (!(m−n)! m n
)n ≈
2 πm
(m e
)m √ 2 πn
(n e
)n √ 2 π(m − n)
(m−n e
)m−n (m n
)n =^
2 πn
m m − n
)m−n+ (^12)
2 πn
n m − n
2 πn
n m − n
)m−nn^ ]n
2 πn
e
n m − n
)− 12 ]n ≥
2 πn
e √ 2
]n = 2θ(n)
We have demonstrated that the required size of a perfect hash family is at least exponential in n. If m ≤ 2 o(n), any polynomial in m is also ≤ 2 o(n)^ since mc^ ≤ 2 c·o(n)^ = 2o(n)^ for any constant c. Thus, a hash family of size only polynomial in m is not large enough be be
exponential in n. Therefore, there is no perfect hash family mapping from m to n of size polynomial in m.
As hinted, we will use a main table and an overflow table. After k probes of the main table, if we have not found an empty cell, we place the item in the overflow table. If you have (1+ε)n space, you can build a main table of size n and a cuckoo-hash table of size εn. By the argument in class, the cuckoo hash table can hold εn/2 items with constant worst-case lookup time. We’re going to guarantee that the “main” table holds only (1−ε/2)n items by the simple rule that if the main table gets that full, we immediately place other incoming items in the cuckoo hash table. Assuming the main table has the claimed limit, k probes to it will fail to find an empty bucket with probability (1 − ε/2)k. If we arrange for (say) (1 − ε/2)k^ ≤ ε/4, then the probability that an item fails to find an empty space is ε/4, so the expected number of items that fail to find a space, and get kicked into overlow, is εn/4. Thus a chernoff bound tells us that it is at most εn/2 with (exponentially) high probability, so the cuckoo table will never get too full to operate in constant time (w.h.p.). Solving, we find that k = log(ε/4)/ log(1 − ε/2) = O(^1 ε log(^1 ε )).
To achieve evaluation time of O(1) in expectation and O(log log m) with high probability, we will modify the consistent hashing algorithm as follows. First, break the ring into m equal sized intervals. Next, associate with each interval the buckets that overlap the interval. Note that the number of buckets associated with each interval is at most 1 more than the number of bucket boundaries that fall within the interval. To find the bucket associated with a particular item, use the hash function to map the item to a number between [0, 1] and find the bucket responsible for the item among the buckets associated with the interval that the item falls in. Thus, the performance of this lookup is dependent on the number of buckets associated with the interval. Specifically, if we pre-order the buckets within each interval, we can find the bucket responsible for a given item using binary search in O(log b) time where b is the number of buckets in the interval. Since the bucket boundary positions are randomly selected, the problem of finding the expected and maximum number of bucket boundary positions within each interval reduces to the m balls in m bins problem. Thus, we can conclude that the expected number of boundary positions in each interval is O(1) and the maximum number of boundary positions
in any interval is O
log m log log m
with high probability. With at most 1 + 1 = 2 buckets in
each interval in expectation, it takes 1 comparison, or O(1) time, to determine which of the 2 buckets an item maps to. We maintain pointers in each empty interval to the next non-empty interval to allow fast(O(1)) search. These pointers can be maintained when