





Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Prof. Salil Vadhan, Computer Science, Pseudorandomness Basic Derandomization Techniques, The Method of Conditional Expectations, Randomized Large Cut Algorithm, Pairwise Independence, Derandomized Large Cut Algorithm, Pairwise Independent Hash Functions, Hash Tables, Randomness-Efficient Error Reduction and Sampling, Chebyshev's Inequality, Pairwise-Independent Tail Inequality, k-Wise Independent Hash Functions, Harvard, Lecture Notes
Typology: Study notes
1 / 9
This page cannot be seen from the preview
Don't miss anything!






CS225: Pseudorandomness Prof. Salil Vadhan
February 20, 2007
Based on scribe notes by Chun-Yun Hsiao and Vinod Vaikuntanathan. Lecture given by Dan Gutfreund.
In the previous lecture, we saw several derandomization techniques (enumeration, nonuniformity, nondeterminism) that are general in the sense that they apply to all of BPP, but are infeasible in the sense that they cannot be implemented by efficient deterministic algorithms. Today, we will see two derandomization techniques that can be implemented efficiently, but do not apply to all randomized algorithms.
The general approach. Consider a randomized algorithm that uses m random bits. We can view all its sequences of coin tosses as corresponding to a binary tree of depth m. We know that most paths (from the root to the leaf) are “good,” i.e., give the correct answer. A natural idea is to try and find such a path by walking down from the root and making “good” choices at each step. Equivalently, we try to find a good sequence of coin tosses “bit-by-bit”.. The idea is simple: we find a good path bit-by-bit.
For 1 ≤ i ≤ m and r 1 , r 2 ,... , ri ∈ { 0 , 1 }, define P (r 1 , r 2 ,... , ri) to be the fraction of continuations that are good sequences of coin tosses. More precisely, for 1 ≤ j ≤ m, let Rj be a random variable over { 0 , 1 } with equal probability,
P (r 1 , r 2 ,... , ri)
def = Pr R 1 ,R 2 ,...,Rm
[A(x; R 1 , R 2 ,... , Rm) is correct |R 1 = r 1 , R 2 = r 2 ,... , Ri = ri]
= (^) E Ri+
[P (r 1 , r 2 ,... , ri, Ri+1)].
By averaging, there exists an ri+1 ∈ { 0 , 1 } such that P (r 1 , r 2 ,... , ri, ri+1) ≥ P (r 1 , r 2 ,... , ri). So at r 1 , r 2 ,... , ri, we simply pick ri+1 that maximizes P (r 1 , r 2 ,... , ri, ri+1). At the end we have r 1 , r 2 ,... , rm, and
P (r 1 , r 2 ,... , rm) ≥ P (r 1 , r 2 ,... , rm− 1 ) ≥ · · · ≥ P (r 1 ) ≥ P (Λ) ≥ 2 / 3
where P (Λ) denotes the fraction of good paths from the root. Then P (r 1 , r 2 ,... , rm) = 1, since it is either 1 or 0.
Note that to implement this method, we need to compute P (r 1 , r 2 ,... , ri) deterministically, and this may be infeasible. However, there are nontrivial algorithms where this method does work,
P(0,1)=7/
o o x o o o o o
Figure 1: An example of P (r 1 , r 2 ), where “o” at the leaf denotes a good path.
often for search problems rather than decision problems. Below we see one such example, where it turns out to yield a natural “greedy algorithm”.
Example. Recall the Large Cut problem: given a graph G = (V, E), find a partition S, T (i.e., S ∩ T = ∅, S ∪ T = V ) such that |cut(S, T )| ≥ |E|/2.^1
We saw a simple randomized algorithm that finds a cut of (expected) size at least |E|/2, which we now phrase in a way suitable for derandomization.
Randomized Large Cut Algorithm: Flip |V | coins r 1 , r 2 ,... , r|V |, put vertex i in S if ri = 1 and in T if ri = 0.
To derandomize this algorithm using the Method of Conditional Expectations, define
e(r 1 , r 2 ,... , ri) def = (^) E R 1 ,R 2 ,...,R|V |
|cut(S, T )|
1 =^ r 1 , R 2 =^ r 2 ,... , Ri =^ ri
to be the expected cut size when the first i random bits are fixed to r 1 , r 2 ,... , ri.
We know that when no random bits are fixed, e[Λ] ≥ |E|/2 (because each edge is cut with probability 1 /2), and all we need to calculate is e(r 1 , r 2 ,... , ri) for 1 ≤ i ≤ n. For this particular algorithm
it turns out that this quantity is not hard to compute. Let Si def = {j : j ≤ i, rj = 1} (resp.
Ti def = {j : j ≤ i, rj = 0}) be the set of vertices in S (resp. T ) after we determine ri, and
Ui def = {i + 1, i + 2,... , n} be the “undecided” vertices that have not been put into S or T. Then
e(r 1 , r 2 ,... , ri) = |cut(Si, Ti)| + 1/2 (|cut(Si, Ui)| + |cut(Ti, Ui)| + |cut(Ui, Ui)|) (1)
(^1) Recall cut(S, T ) def = {(s, t) : (s, t) ∈ E, s ∈ S, t ∈ T }.
Note that this is the natural ‘greedy’ algorithm for this problem. In other cases, the Method of Con- ditional Expectations yields algorithms that, while still arguably ‘greedy’, would have been much less easy to find directly. Thus, designing a randomized algorithm and then trying to derandomize it can be a useful paradigm for the design of deterministic algorithms even if the randomization does not provide gains in efficiency.
As our first motivating example, we give another way of derandomizing the Large Cut algorithm discussed above. Recall the analysis of the randomized algorithm:
E[|cut(S)|] =^
(i,j)∈E
Pr[Ri 6 = Rj ] = |E|/ 2 ,
where R 1 ,... , Rn are the random bits of the algorithm. The key observation is that this analysis applies for any distribution on (R 1 ,... , Rn) satisfying Pr[Ri 6 = Rj ] = 1/2 for each i 6 = j. Thus, they do not need to be completely independent random variables; it suffices for them to be pairwise independent. That is, each Ri is an unbiased random bit, and for each i 6 = j, Ri is independent from Rj.
This leads to the question: Can we generate N pairwise independent bits using less than N truly random bits?
Proposition 1 Let B 1 ,... , Bk be k independent unbiased random bits. For each nonempty S ⊆ [k], let RS be the random variable ⊕i∈S bi. Then the RS ’s are 2 k^ − 1 pairwise independent unbiased random bits.
Proof: It is evident that each RS is unbiased. For pairwise independence, consider any two nonempty sets S 6 = T ⊆ [k]. Then:
RS = RS∩T ⊕ RS\T RT = RS∩T ⊕ RT \S.
Note that RS∩T , RS\T and RT \S are independent as they depend on disjoint subsets of the Bi’s, and at least two of these subsets are nonempty (because S 6 = T ). This implies that (RS , RT ) takes each value in { 0 , 1 }^2 with probability 1/4.
Note that this gives us a way to generate N pairwise independent bits from dlog(N +1)e independent random bits. Thus, we can reduce the randomness required by the Large Cut algorithm to logarithmic, and then we can obtain a deterministic algorithm by enumeration.
Derandomized Large Cut Algorithm: For all sequences of bits b 1 , b 2 ,... , bdlog(n+1)e, run the randomized Large Cut algorithm using coin tosses {rS = ⊕i∈S bi}S 6 =∅ and choose the largest cut thus obtained.
Since there are at most 2(n + 1) sequences of bi’s, the derandomized algorithm still runs in poly(n) time. It is slower than the greedy algorithm obtained by the Method of Conditional Expectations, but it has the advantage of using only O(log n) workspace and being parallelizable.
Some applications require pairwise independent random variables that take values from a larger range, e.g. we want N = 2n^ pairwise independent random variables, each of which is uniformly distributed in { 0 , 1 }m^ = [M ]. The na¨ıve approach is to repeat the above algorithm for the individual bits m times. This uses (log M )(log N ) bits to start with, which is no longer logarithmic in N if M is nonconstant. It turns out we can do much better.
A sequences of N random variables each taking a value in [M ] can be viewed as a distribution on sequences in [M ]N^. Another interpretation of such a sequence is as a mapping f : [N ] → [M ]. The latter interpretation turns out to be more useful when discussing the computational complexity of the constructions.
Definition 2 (Pairwise Independent Hash Functions) A family of functions H = {h : [N ] → [M ]} is pairwise independent if the following two conditions hold:
Equivalently, we can combine the two conditions to say that
∀x 1 6 = x 2 ∈ [N ], ∀y 1 , y 2 ∈ [M ], Pr H ←HR
[H(x 1 ) = y 1 ∧ H(x 2 ) = y 2 ] =
Note that the probability above is over the random choice of a function from the family H. This is why we talk about a family of functions rather than a single function. The description in terms of functions makes it natural to impose a strong efficiency requirement — we ask that given the description of h and x ∈ [N ], the value h(x) can be computed in time poly(log N, log M ). We call such a family of hash functions explicit.
Pairwise independent functions are a strengthening of)universal hash functions, which require only that Pr[H(x 1 ) = H(x 2 )] = 1/M for all x 6 = y.
Below we present another construction of a pairwise independent family.
Proposition 3 Let F be a finite field. Define the family of functions H = { ha,b : F → F} where each ha,b(x) = ax + b for a, b ∈ F. Then H is pairwise independent.
Proof Sketch: Notice that the graph of the function ha,b(x) is the line with slope a and y- intercept b. Given x 1 6 = x 2 and y 1 , y 2 , there is exactly one line containing the points (x 1 , y 1 ) and (x 2 , y 2 ). Thus, the probability over a, b that ha,b(x 1 ) = y 1 and ha,b(x 2 ) = y 2 equals the reciprocal of the number of lines, namely 1/‖F ‖^2.
This construction uses 2 log |F| random bits, since we have to choose a and b at random from F to
get a function ha,b ← HR. Compare this to |F| log |F| bits required to choose |F| fully independent values from F, and (log |F|)^2 bits for repeating the construction of Proposition 1 for each output bit.
large, and thus it is infeasible to even write down a truly random hash function. Thus, it would preferable to show that some explicit family of hash function works for the application with similar performance. In many cases, it can be shown that pairwise independence (or k-wise independence, as discussed below) suffices.
Suppose we have a BPP algorithm for a language L that has a constant error probability. We want to reduce the error to 2−k. We have already seen that using O(k) independent repetitions, we can reduce the error of a BPP algorithm to 2−k^ (using a Chernoff Bound). If the algorithm originally used m random bits, then we need O(km) random bits after error reduction. Here we will see how to reduce the number of random bits required for error reduction by doing only pairwise independent repetitions.
To analyze this, we will need an analogue of the Chernoff Bound that applies to sums of pairwise independent random variables. This follows from Chebychev’s Inequality. For a random variable X with expectation μ, recall that its variance is defined to be Var[X] = E[(X − μ)^2 ] = E[X^2 ] − μ^2.
Lemma 5 (Chebyshev’s Inequality) Let X be a random variable with expectation μ, then
Pr[|X − μ| ≥ ε] ≤
Var[X] ε^2
Proof: Let Y = (X − μ)^2. Then
Pr[|X − μ| ≥ ε] = Pr[(X − μ)^2 ≥ ε^2 ] ≤ E[(
X − μ)^2 ] ε^2
Var[X] ε^2
We now use this to show that sums of pairwise independent random variables are concentrated around their expectation.
Proposition 6 (Pairwise-Independent Tail Inequality) Let X 1 ,... , Xt be pairwise indepen- dent random variables taking values in the interval [0, 1], let X = (
i Xi)/t, and^ μ^ = E[X]. Then
Pr[|X − μ| ≥ ε] ≤
tε^2
Proof: Let μi = E[Xi]. Then
Var[X] = (^) E[(X − μ)^2 ]
=
t^2
i
(Xi − μi))^2 ]
t^2
i,j
E[(Xi −^ μi)(Xj −^ μj )]
t^2
i
E[(Xi −^ μi)^2 ]^ (by pairwise independence)
t^2
i
Var[Xi]
t
Now apply Chebychev’s Inequality.
While this requires less independence than the Chernoff Bound, notice that error probability de- creases only linearly with t.
Error Reduction. Proposition 6 tells us that if we use t = O(2k) pairwise independent repe- titions, we can reduce the error probability of a BPP algorithm from 1/3 to 2−k. If the original BPP algorithm uses m random bits, then we can do this by choosing h : { 0 , 1 }k^ → { 0 , 1 }m^ at random from a pairwise independent family, and running the algorithm using coin tosses h(x) for all x ∈ { 0 , 1 }k^ This requires O(k + m) random bits.
Number of Repetitions Number of Random Bits Independent Repetitions O(k) O(km) Pairwise Independent Repetitions O(2k) O(k + m)
Note that we have saved substantially on the number of random bits, but paid a lot in the number of repetitions needed. To maintain a polynomial-time algorithm, we can only afford k = O(log n). This setting implies that if we have a BPP algorithm with a constant error that uses m random bits, we have another BPP algorithm that uses O(m + log n) = O(m) random bits and has an error of 1/poly(n). That is, we can go from constant to inverse-polynomial error only paying a constant factor in randomness.
Sampling. Recall the Sampling problem: Given an oracle to a function f : { 0 , 1 }m^ → [0, 1], we want to approximate μ(f ) to within an additive error of ε.
We saw that we can solve this problem with probability 1 − δ by outputting the average of f on a random sample of t = O(log(1/δ)/ε^2 ) points in { 0 , 1 }m, where the correctness follows from the Chernoff Bound. To reduce the number of truly random bits used, we can use a pairwise inde- pendent sample instead. Specifically, taking t = 1/(ε^2 δ) pairwise independent points, we get an error probability of at most δ. To generate t pairwise independent samples of m bits each, we need O(m + log(1/ε) + log(1/δ)) truly random bits.
Number of Samples Number of Random Bits Truly Random Sample O( (^) ^12 log (^1) δ ) O( m 2 log (^1) δ ) Pairwise Independent Repetitions O( (^) ^12 δ ) O(m + log (^1) + log (^1) δ )
k-wise Independence Our definition and construction of pairwise independent functions gener- alizes naturally to k-wise independence for any k.