Pseudorandomness Basic Derandomization Techniques 2, Lecture Notes - Computer Science, Study notes of Number Theory

Prof. Salil Vadhan, Computer Science, Pseudorandomness Basic Derandomization Techniques, The Method of Conditional Expectations, Randomized Large Cut Algorithm, Pairwise Independence, Derandomized Large Cut Algorithm, Pairwise Independent Hash Functions, Hash Tables, Randomness-Efficient Error Reduction and Sampling, Chebyshev's Inequality, Pairwise-Independent Tail Inequality, k-Wise Independent Hash Functions, Harvard, Lecture Notes

Typology: Study notes

2010/2011

Uploaded on 10/26/2011

thecoral
thecoral 🇺🇸

4.5

(30)

395 documents

1 / 9

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
CS225: Pseudorandomness Prof. Salil Vadhan
Lecture 6: Basic Derandomization Techniques II
February 20, 2007
Based on scribe notes by Chun-Yun Hsiao and Vinod Vaikuntanathan. Lecture given by Dan
Gutfreund.
1 Recap
In the previous lecture, we saw several derandomization techniques (enumeration, nonuniformity,
nondeterminism) that are general in the sense that they apply to all of BPP, but are infeasible in
the sense that they cannot be implemented by efficient deterministic algorithms. Today, we will
see two derandomization techniques that can be implemented efficiently, but do not apply to all
randomized algorithms.
2 The Method of Conditional Expectations
The general approach. Consider a randomized algorithm that uses mrandom bits. We can
view all its sequences of coin tosses as corresponding to a binary tree of depth m. We know that
most paths (from the root to the leaf) are “good,” i.e., give the correct answer. A natural idea
is to try and find such a path by walking down from the root and making “good” choices at each
step. Equivalently, we try to find a good sequence of coin tosses “bit-by-bit”. . The idea is simple:
we find a good path bit-by-bit.
For 1 imand r1, r2,...,ri {0,1}, define P(r1, r2,...,ri) to be the fraction of continuations
that are good sequences of coin tosses. More precisely, for 1 jm, let Rjbe a random variable
over {0,1}with equal probability,
P(r1, r2,...,ri)def
= Pr
R1,R2,...,Rm
[A(x;R1, R2,...,Rm) is correct |R1=r1, R2=r2,...,Ri=ri]
=E
Ri+1
[P(r1, r2,...,ri, Ri+1)].
By averaging, there exists an ri+1 {0,1}such that P(r1, r2,...,ri, ri+1)P(r1, r2,...,ri). So
at r1, r2,...,ri, we simply pick ri+1 that maximizes P(r1, r2,...,ri, ri+1). At the end we have
r1, r2,...,rm, and
P(r1, r2,...,rm)P(r1, r2,...,rm1) · · · P(r1)P(Λ) 2/3
where P(Λ) denotes the fraction of good paths from the root. Then P(r1, r2,...,rm) = 1, since it
is either 1 or 0.
Note that to implement this method, we need to compute P(r1, r2, . . . , ri) deterministically, and
this may be infeasible. However, there are nontrivial algorithms where this method does work,
1
pf3
pf4
pf5
pf8
pf9

Partial preview of the text

Download Pseudorandomness Basic Derandomization Techniques 2, Lecture Notes - Computer Science and more Study notes Number Theory in PDF only on Docsity!

CS225: Pseudorandomness Prof. Salil Vadhan

Lecture 6: Basic Derandomization Techniques II

February 20, 2007

Based on scribe notes by Chun-Yun Hsiao and Vinod Vaikuntanathan. Lecture given by Dan Gutfreund.

1 Recap

In the previous lecture, we saw several derandomization techniques (enumeration, nonuniformity, nondeterminism) that are general in the sense that they apply to all of BPP, but are infeasible in the sense that they cannot be implemented by efficient deterministic algorithms. Today, we will see two derandomization techniques that can be implemented efficiently, but do not apply to all randomized algorithms.

2 The Method of Conditional Expectations

The general approach. Consider a randomized algorithm that uses m random bits. We can view all its sequences of coin tosses as corresponding to a binary tree of depth m. We know that most paths (from the root to the leaf) are “good,” i.e., give the correct answer. A natural idea is to try and find such a path by walking down from the root and making “good” choices at each step. Equivalently, we try to find a good sequence of coin tosses “bit-by-bit”.. The idea is simple: we find a good path bit-by-bit.

For 1 ≤ i ≤ m and r 1 , r 2 ,... , ri ∈ { 0 , 1 }, define P (r 1 , r 2 ,... , ri) to be the fraction of continuations that are good sequences of coin tosses. More precisely, for 1 ≤ j ≤ m, let Rj be a random variable over { 0 , 1 } with equal probability,

P (r 1 , r 2 ,... , ri)

def = Pr R 1 ,R 2 ,...,Rm

[A(x; R 1 , R 2 ,... , Rm) is correct |R 1 = r 1 , R 2 = r 2 ,... , Ri = ri]

= (^) E Ri+

[P (r 1 , r 2 ,... , ri, Ri+1)].

By averaging, there exists an ri+1 ∈ { 0 , 1 } such that P (r 1 , r 2 ,... , ri, ri+1) ≥ P (r 1 , r 2 ,... , ri). So at r 1 , r 2 ,... , ri, we simply pick ri+1 that maximizes P (r 1 , r 2 ,... , ri, ri+1). At the end we have r 1 , r 2 ,... , rm, and

P (r 1 , r 2 ,... , rm) ≥ P (r 1 , r 2 ,... , rm− 1 ) ≥ · · · ≥ P (r 1 ) ≥ P (Λ) ≥ 2 / 3

where P (Λ) denotes the fraction of good paths from the root. Then P (r 1 , r 2 ,... , rm) = 1, since it is either 1 or 0.

Note that to implement this method, we need to compute P (r 1 , r 2 ,... , ri) deterministically, and this may be infeasible. However, there are nontrivial algorithms where this method does work,

P(0,1)=7/

o o x o o o o o

Figure 1: An example of P (r 1 , r 2 ), where “o” at the leaf denotes a good path.

often for search problems rather than decision problems. Below we see one such example, where it turns out to yield a natural “greedy algorithm”.

Example. Recall the Large Cut problem: given a graph G = (V, E), find a partition S, T (i.e., S ∩ T = ∅, S ∪ T = V ) such that |cut(S, T )| ≥ |E|/2.^1

We saw a simple randomized algorithm that finds a cut of (expected) size at least |E|/2, which we now phrase in a way suitable for derandomization.

Randomized Large Cut Algorithm: Flip |V | coins r 1 , r 2 ,... , r|V |, put vertex i in S if ri = 1 and in T if ri = 0.

To derandomize this algorithm using the Method of Conditional Expectations, define

e(r 1 , r 2 ,... , ri) def = (^) E R 1 ,R 2 ,...,R|V |

[

|cut(S, T )|

∣∣R

1 =^ r 1 , R 2 =^ r 2 ,... , Ri =^ ri

]

to be the expected cut size when the first i random bits are fixed to r 1 , r 2 ,... , ri.

We know that when no random bits are fixed, e[Λ] ≥ |E|/2 (because each edge is cut with probability 1 /2), and all we need to calculate is e(r 1 , r 2 ,... , ri) for 1 ≤ i ≤ n. For this particular algorithm

it turns out that this quantity is not hard to compute. Let Si def = {j : j ≤ i, rj = 1} (resp.

Ti def = {j : j ≤ i, rj = 0}) be the set of vertices in S (resp. T ) after we determine ri, and

Ui def = {i + 1, i + 2,... , n} be the “undecided” vertices that have not been put into S or T. Then

e(r 1 , r 2 ,... , ri) = |cut(Si, Ti)| + 1/2 (|cut(Si, Ui)| + |cut(Ti, Ui)| + |cut(Ui, Ui)|) (1)

(^1) Recall cut(S, T ) def = {(s, t) : (s, t) ∈ E, s ∈ S, t ∈ T }.

Note that this is the natural ‘greedy’ algorithm for this problem. In other cases, the Method of Con- ditional Expectations yields algorithms that, while still arguably ‘greedy’, would have been much less easy to find directly. Thus, designing a randomized algorithm and then trying to derandomize it can be a useful paradigm for the design of deterministic algorithms even if the randomization does not provide gains in efficiency.

3 Pairwise Independence

As our first motivating example, we give another way of derandomizing the Large Cut algorithm discussed above. Recall the analysis of the randomized algorithm:

E[|cut(S)|] =^

(i,j)∈E

Pr[Ri 6 = Rj ] = |E|/ 2 ,

where R 1 ,... , Rn are the random bits of the algorithm. The key observation is that this analysis applies for any distribution on (R 1 ,... , Rn) satisfying Pr[Ri 6 = Rj ] = 1/2 for each i 6 = j. Thus, they do not need to be completely independent random variables; it suffices for them to be pairwise independent. That is, each Ri is an unbiased random bit, and for each i 6 = j, Ri is independent from Rj.

This leads to the question: Can we generate N pairwise independent bits using less than N truly random bits?

Proposition 1 Let B 1 ,... , Bk be k independent unbiased random bits. For each nonempty S ⊆ [k], let RS be the random variable ⊕i∈S bi. Then the RS ’s are 2 k^ − 1 pairwise independent unbiased random bits.

Proof: It is evident that each RS is unbiased. For pairwise independence, consider any two nonempty sets S 6 = T ⊆ [k]. Then:

RS = RS∩T ⊕ RS\T RT = RS∩T ⊕ RT \S.

Note that RS∩T , RS\T and RT \S are independent as they depend on disjoint subsets of the Bi’s, and at least two of these subsets are nonempty (because S 6 = T ). This implies that (RS , RT ) takes each value in { 0 , 1 }^2 with probability 1/4.

Note that this gives us a way to generate N pairwise independent bits from dlog(N +1)e independent random bits. Thus, we can reduce the randomness required by the Large Cut algorithm to logarithmic, and then we can obtain a deterministic algorithm by enumeration.

Derandomized Large Cut Algorithm: For all sequences of bits b 1 , b 2 ,... , bdlog(n+1)e, run the randomized Large Cut algorithm using coin tosses {rS = ⊕i∈S bi}S 6 =∅ and choose the largest cut thus obtained.

Since there are at most 2(n + 1) sequences of bi’s, the derandomized algorithm still runs in poly(n) time. It is slower than the greedy algorithm obtained by the Method of Conditional Expectations, but it has the advantage of using only O(log n) workspace and being parallelizable.

4 Pairwise Independent Hash Functions

Some applications require pairwise independent random variables that take values from a larger range, e.g. we want N = 2n^ pairwise independent random variables, each of which is uniformly distributed in { 0 , 1 }m^ = [M ]. The na¨ıve approach is to repeat the above algorithm for the individual bits m times. This uses (log M )(log N ) bits to start with, which is no longer logarithmic in N if M is nonconstant. It turns out we can do much better.

A sequences of N random variables each taking a value in [M ] can be viewed as a distribution on sequences in [M ]N^. Another interpretation of such a sequence is as a mapping f : [N ] → [M ]. The latter interpretation turns out to be more useful when discussing the computational complexity of the constructions.

Definition 2 (Pairwise Independent Hash Functions) A family of functions H = {h : [N ] → [M ]} is pairwise independent if the following two conditions hold:

  1. ∀x ∈ [N ], the random variable H(x) is uniformly distributed in [M ] when H ← HR.
  2. ∀x 1 6 = x 2 ∈ [N ], the random variables H(x 1 ) and H(x 2 ) are independent when H ← HR ,.

Equivalently, we can combine the two conditions to say that

∀x 1 6 = x 2 ∈ [N ], ∀y 1 , y 2 ∈ [M ], Pr H ←HR

[H(x 1 ) = y 1 ∧ H(x 2 ) = y 2 ] =

M 2

Note that the probability above is over the random choice of a function from the family H. This is why we talk about a family of functions rather than a single function. The description in terms of functions makes it natural to impose a strong efficiency requirement — we ask that given the description of h and x ∈ [N ], the value h(x) can be computed in time poly(log N, log M ). We call such a family of hash functions explicit.

Pairwise independent functions are a strengthening of)universal hash functions, which require only that Pr[H(x 1 ) = H(x 2 )] = 1/M for all x 6 = y.

Below we present another construction of a pairwise independent family.

Proposition 3 Let F be a finite field. Define the family of functions H = { ha,b : F → F} where each ha,b(x) = ax + b for a, b ∈ F. Then H is pairwise independent.

Proof Sketch: Notice that the graph of the function ha,b(x) is the line with slope a and y- intercept b. Given x 1 6 = x 2 and y 1 , y 2 , there is exactly one line containing the points (x 1 , y 1 ) and (x 2 , y 2 ). Thus, the probability over a, b that ha,b(x 1 ) = y 1 and ha,b(x 2 ) = y 2 equals the reciprocal of the number of lines, namely 1/‖F ‖^2. 

This construction uses 2 log |F| random bits, since we have to choose a and b at random from F to

get a function ha,b ← HR. Compare this to |F| log |F| bits required to choose |F| fully independent values from F, and (log |F|)^2 bits for repeating the construction of Proposition 1 for each output bit.

large, and thus it is infeasible to even write down a truly random hash function. Thus, it would preferable to show that some explicit family of hash function works for the application with similar performance. In many cases, it can be shown that pairwise independence (or k-wise independence, as discussed below) suffices.

6 Randomness-Efficient Error Reduction and Sampling

Suppose we have a BPP algorithm for a language L that has a constant error probability. We want to reduce the error to 2−k. We have already seen that using O(k) independent repetitions, we can reduce the error of a BPP algorithm to 2−k^ (using a Chernoff Bound). If the algorithm originally used m random bits, then we need O(km) random bits after error reduction. Here we will see how to reduce the number of random bits required for error reduction by doing only pairwise independent repetitions.

To analyze this, we will need an analogue of the Chernoff Bound that applies to sums of pairwise independent random variables. This follows from Chebychev’s Inequality. For a random variable X with expectation μ, recall that its variance is defined to be Var[X] = E[(X − μ)^2 ] = E[X^2 ] − μ^2.

Lemma 5 (Chebyshev’s Inequality) Let X be a random variable with expectation μ, then

Pr[|X − μ| ≥ ε] ≤

Var[X] ε^2

Proof: Let Y = (X − μ)^2. Then

Pr[|X − μ| ≥ ε] = Pr[(X − μ)^2 ≥ ε^2 ] ≤ E[(

X − μ)^2 ] ε^2

Var[X] ε^2

We now use this to show that sums of pairwise independent random variables are concentrated around their expectation.

Proposition 6 (Pairwise-Independent Tail Inequality) Let X 1 ,... , Xt be pairwise indepen- dent random variables taking values in the interval [0, 1], let X = (

i Xi)/t, and^ μ^ = E[X]. Then

Pr[|X − μ| ≥ ε] ≤

tε^2

Proof: Let μi = E[Xi]. Then

Var[X] = (^) E[(X − μ)^2 ]

=

t^2

E[(

i

(Xi − μi))^2 ]

t^2

i,j

E[(Xi −^ μi)(Xj −^ μj )]

t^2

i

E[(Xi −^ μi)^2 ]^ (by pairwise independence)

t^2

i

Var[Xi]

t

Now apply Chebychev’s Inequality.

While this requires less independence than the Chernoff Bound, notice that error probability de- creases only linearly with t.

Error Reduction. Proposition 6 tells us that if we use t = O(2k) pairwise independent repe- titions, we can reduce the error probability of a BPP algorithm from 1/3 to 2−k. If the original BPP algorithm uses m random bits, then we can do this by choosing h : { 0 , 1 }k^ → { 0 , 1 }m^ at random from a pairwise independent family, and running the algorithm using coin tosses h(x) for all x ∈ { 0 , 1 }k^ This requires O(k + m) random bits.

Number of Repetitions Number of Random Bits Independent Repetitions O(k) O(km) Pairwise Independent Repetitions O(2k) O(k + m)

Note that we have saved substantially on the number of random bits, but paid a lot in the number of repetitions needed. To maintain a polynomial-time algorithm, we can only afford k = O(log n). This setting implies that if we have a BPP algorithm with a constant error that uses m random bits, we have another BPP algorithm that uses O(m + log n) = O(m) random bits and has an error of 1/poly(n). That is, we can go from constant to inverse-polynomial error only paying a constant factor in randomness.

Sampling. Recall the Sampling problem: Given an oracle to a function f : { 0 , 1 }m^ → [0, 1], we want to approximate μ(f ) to within an additive error of ε.

We saw that we can solve this problem with probability 1 − δ by outputting the average of f on a random sample of t = O(log(1/δ)/ε^2 ) points in { 0 , 1 }m, where the correctness follows from the Chernoff Bound. To reduce the number of truly random bits used, we can use a pairwise inde- pendent sample instead. Specifically, taking t = 1/(ε^2 δ) pairwise independent points, we get an error probability of at most δ. To generate t pairwise independent samples of m bits each, we need O(m + log(1/ε) + log(1/δ)) truly random bits.

Number of Samples Number of Random Bits Truly Random Sample O( (^) ^12 log (^1) δ ) O( m  2 log (^1) δ ) Pairwise Independent Repetitions O( (^) ^12 δ ) O(m + log (^1)  + log (^1) δ )

k-wise Independence Our definition and construction of pairwise independent functions gener- alizes naturally to k-wise independence for any k.