Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Pseudorandomness Basic Derandomization Techniques 2, Lecture Notes - Computer Science, Study notes of Number Theory

Harvard University Number Theory

Prof. Salil Vadhan, Computer Science, Pseudorandomness Basic Derandomization Techniques, The Method of Conditional Expectations, Randomized Large Cut Algorithm, Pairwise Independence, Derandomized Large Cut Algorithm, Pairwise Independent Hash Functions, Hash Tables, Randomness-Efficient Error Reduction and Sampling, Chebyshev's Inequality, Pairwise-Independent Tail Inequality, k-Wise Independent Hash Functions, Harvard, Lecture Notes

Typology: Study notes

2010/2011

Uploaded on 10/26/2011

thecoral 🇺🇸

4.5

(30)

395 documents

1 / 9

This page cannot be seen from the preview

Don't miss anything!

CS225: Pseudorandomness Prof. Salil Vadhan

Lecture 6: Basic Derandomization Techniques II

February 20, 2007

Based on scribe notes by Chun-Yun Hsiao and Vinod Vaikuntanathan. Lecture given by Dan

Gutfreund.

1 Recap

In the previous lecture, we saw several derandomization techniques (enumeration, nonuniformity,

nondeterminism) that are general in the sense that they apply to all of BPP, but are infeasible in

the sense that they cannot be implemented by efficient deterministic algorithms. Today, we will

see two derandomization techniques that can be implemented efficiently, but do not apply to all

randomized algorithms.

2 The Method of Conditional Expectations

The general approach. Consider a randomized algorithm that uses mrandom bits. We can

view all its sequences of coin tosses as corresponding to a binary tree of depth m. We know that

most paths (from the root to the leaf) are “good,” i.e., give the correct answer. A natural idea

is to try and find such a path by walking down from the root and making “good” choices at each

step. Equivalently, we try to find a good sequence of coin tosses “bit-by-bit”. . The idea is simple:

we find a good path bit-by-bit.

For 1 ≤i≤mand r1, r2,...,ri∈ {0,1}, define P(r1, r2,...,ri) to be the fraction of continuations

that are good sequences of coin tosses. More precisely, for 1 ≤j≤m, let Rjbe a random variable

over {0,1}with equal probability,

P(r1, r2,...,ri)def

= Pr

R1,R2,...,Rm

[A(x;R1, R2,...,Rm) is correct |R1=r1, R2=r2,...,Ri=ri]

=E

Ri+1

[P(r1, r2,...,ri, Ri+1)].

By averaging, there exists an ri+1 ∈ {0,1}such that P(r1, r2,...,ri, ri+1)≥P(r1, r2,...,ri). So

at r1, r2,...,ri, we simply pick ri+1 that maximizes P(r1, r2,...,ri, ri+1). At the end we have

r1, r2,...,rm, and

P(r1, r2,...,rm)≥P(r1, r2,...,rm−1)≥ · · · ≥ P(r1)≥P(Λ) ≥2/3

where P(Λ) denotes the fraction of good paths from the root. Then P(r1, r2,...,rm) = 1, since it

is either 1 or 0.

Note that to implement this method, we need to compute P(r1, r2, . . . , ri) deterministically, and

this may be infeasible. However, there are nontrivial algorithms where this method does work,

1

Discover Study notes of Number Theory Harvard University

Partial preview of the text

Download Pseudorandomness Basic Derandomization Techniques 2, Lecture Notes - Computer Science and more Study notes Number Theory in PDF only on Docsity!

CS225: Pseudorandomness Prof. Salil Vadhan

Lecture 6: Basic Derandomization Techniques II

February 20, 2007

Based on scribe notes by Chun-Yun Hsiao and Vinod Vaikuntanathan. Lecture given by Dan Gutfreund.

1 Recap

In the previous lecture, we saw several derandomization techniques (enumeration, nonuniformity, nondeterminism) that are general in the sense that they apply to all of BPP, but are infeasible in the sense that they cannot be implemented by efficient deterministic algorithms. Today, we will see two derandomization techniques that can be implemented efficiently, but do not apply to all randomized algorithms.

2 The Method of Conditional Expectations

The general approach. Consider a randomized algorithm that uses m random bits. We can view all its sequences of coin tosses as corresponding to a binary tree of depth m. We know that most paths (from the root to the leaf) are “good,” i.e., give the correct answer. A natural idea is to try and find such a path by walking down from the root and making “good” choices at each step. Equivalently, we try to find a good sequence of coin tosses “bit-by-bit”.. The idea is simple: we find a good path bit-by-bit.

For 1 ≤ i ≤ m and r 1 , r 2 ,... , ri ∈ { 0 , 1 }, define P (r 1 , r 2 ,... , ri) to be the fraction of continuations that are good sequences of coin tosses. More precisely, for 1 ≤ j ≤ m, let Rj be a random variable over { 0 , 1 } with equal probability,

P (r 1 , r 2 ,... , ri)

def = Pr R 1 ,R 2 ,...,Rm

[A(x; R 1 , R 2 ,... , Rm) is correct |R 1 = r 1 , R 2 = r 2 ,... , Ri = ri]

= (^) E Ri+

[P (r 1 , r 2 ,... , ri, Ri+1)].

By averaging, there exists an ri+1 ∈ { 0 , 1 } such that P (r 1 , r 2 ,... , ri, ri+1) ≥ P (r 1 , r 2 ,... , ri). So at r 1 , r 2 ,... , ri, we simply pick ri+1 that maximizes P (r 1 , r 2 ,... , ri, ri+1). At the end we have r 1 , r 2 ,... , rm, and

P (r 1 , r 2 ,... , rm) ≥ P (r 1 , r 2 ,... , rm− 1 ) ≥ · · · ≥ P (r 1 ) ≥ P (Λ) ≥ 2 / 3

where P (Λ) denotes the fraction of good paths from the root. Then P (r 1 , r 2 ,... , rm) = 1, since it is either 1 or 0.

Note that to implement this method, we need to compute P (r 1 , r 2 ,... , ri) deterministically, and this may be infeasible. However, there are nontrivial algorithms where this method does work,

P(0,1)=7/

o o x o o o o o

Figure 1: An example of P (r 1 , r 2 ), where “o” at the leaf denotes a good path.

often for search problems rather than decision problems. Below we see one such example, where it turns out to yield a natural “greedy algorithm”.

Example. Recall the Large Cut problem: given a graph G = (V, E), find a partition S, T (i.e., S ∩ T = ∅, S ∪ T = V ) such that |cut(S, T )| ≥ |E|/2.^1

We saw a simple randomized algorithm that finds a cut of (expected) size at least |E|/2, which we now phrase in a way suitable for derandomization.

Randomized Large Cut Algorithm: Flip |V | coins r 1 , r 2 ,... , r|V |, put vertex i in S if ri = 1 and in T if ri = 0.

To derandomize this algorithm using the Method of Conditional Expectations, define

e(r 1 , r 2 ,... , ri) def = (^) E R 1 ,R 2 ,...,R|V |

[

|cut(S, T )|

∣∣R

1 =^ r 1 , R 2 =^ r 2 ,... , Ri =^ ri

]

to be the expected cut size when the first i random bits are fixed to r 1 , r 2 ,... , ri.

We know that when no random bits are fixed, e[Λ] ≥ |E|/2 (because each edge is cut with probability 1 /2), and all we need to calculate is e(r 1 , r 2 ,... , ri) for 1 ≤ i ≤ n. For this particular algorithm

it turns out that this quantity is not hard to compute. Let Si def = {j : j ≤ i, rj = 1} (resp.

Ti def = {j : j ≤ i, rj = 0}) be the set of vertices in S (resp. T ) after we determine ri, and

Ui def = {i + 1, i + 2,... , n} be the “undecided” vertices that have not been put into S or T. Then

e(r 1 , r 2 ,... , ri) = |cut(Si, Ti)| + 1/2 (|cut(Si, Ui)| + |cut(Ti, Ui)| + |cut(Ui, Ui)|) (1)

(^1) Recall cut(S, T ) def = {(s, t) : (s, t) ∈ E, s ∈ S, t ∈ T }.

Note that this is the natural ‘greedy’ algorithm for this problem. In other cases, the Method of Con- ditional Expectations yields algorithms that, while still arguably ‘greedy’, would have been much less easy to find directly. Thus, designing a randomized algorithm and then trying to derandomize it can be a useful paradigm for the design of deterministic algorithms even if the randomization does not provide gains in efficiency.

3 Pairwise Independence

As our first motivating example, we give another way of derandomizing the Large Cut algorithm discussed above. Recall the analysis of the randomized algorithm:

E[|cut(S)|] =^

(i,j)∈E

Pr[Ri 6 = Rj ] = |E|/ 2 ,

where R 1 ,... , Rn are the random bits of the algorithm. The key observation is that this analysis applies for any distribution on (R 1 ,... , Rn) satisfying Pr[Ri 6 = Rj ] = 1/2 for each i 6 = j. Thus, they do not need to be completely independent random variables; it suffices for them to be pairwise independent. That is, each Ri is an unbiased random bit, and for each i 6 = j, Ri is independent from Rj.

This leads to the question: Can we generate N pairwise independent bits using less than N truly random bits?

Proposition 1 Let B 1 ,... , Bk be k independent unbiased random bits. For each nonempty S ⊆ [k], let RS be the random variable ⊕i∈S bi. Then the RS ’s are 2 k^ − 1 pairwise independent unbiased random bits.

Proof: It is evident that each RS is unbiased. For pairwise independence, consider any two nonempty sets S 6 = T ⊆ [k]. Then:

RS = RS∩T ⊕ RS\T RT = RS∩T ⊕ RT \S.

Note that RS∩T , RS\T and RT \S are independent as they depend on disjoint subsets of the Bi’s, and at least two of these subsets are nonempty (because S 6 = T ). This implies that (RS , RT ) takes each value in { 0 , 1 }^2 with probability 1/4.

Note that this gives us a way to generate N pairwise independent bits from dlog(N +1)e independent random bits. Thus, we can reduce the randomness required by the Large Cut algorithm to logarithmic, and then we can obtain a deterministic algorithm by enumeration.

Derandomized Large Cut Algorithm: For all sequences of bits b 1 , b 2 ,... , bdlog(n+1)e, run the randomized Large Cut algorithm using coin tosses {rS = ⊕i∈S bi}S 6 =∅ and choose the largest cut thus obtained.

Since there are at most 2(n + 1) sequences of bi’s, the derandomized algorithm still runs in poly(n) time. It is slower than the greedy algorithm obtained by the Method of Conditional Expectations, but it has the advantage of using only O(log n) workspace and being parallelizable.

4 Pairwise Independent Hash Functions

Some applications require pairwise independent random variables that take values from a larger range, e.g. we want N = 2n^ pairwise independent random variables, each of which is uniformly distributed in { 0 , 1 }m^ = [M ]. The na¨ıve approach is to repeat the above algorithm for the individual bits m times. This uses (log M )(log N ) bits to start with, which is no longer logarithmic in N if M is nonconstant. It turns out we can do much better.

A sequences of N random variables each taking a value in [M ] can be viewed as a distribution on sequences in [M ]N^. Another interpretation of such a sequence is as a mapping f : [N ] → [M ]. The latter interpretation turns out to be more useful when discussing the computational complexity of the constructions.

Definition 2 (Pairwise Independent Hash Functions) A family of functions H = {h : [N ] → [M ]} is pairwise independent if the following two conditions hold:

∀x ∈ [N ], the random variable H(x) is uniformly distributed in [M ] when H ← HR.
∀x 1 6 = x 2 ∈ [N ], the random variables H(x 1 ) and H(x 2 ) are independent when H ← HR ,.

Equivalently, we can combine the two conditions to say that

∀x 1 6 = x 2 ∈ [N ], ∀y 1 , y 2 ∈ [M ], Pr H ←HR

[H(x 1 ) = y 1 ∧ H(x 2 ) = y 2 ] =

M 2

Note that the probability above is over the random choice of a function from the family H. This is why we talk about a family of functions rather than a single function. The description in terms of functions makes it natural to impose a strong efficiency requirement — we ask that given the description of h and x ∈ [N ], the value h(x) can be computed in time poly(log N, log M ). We call such a family of hash functions explicit.

Pairwise independent functions are a strengthening of)universal hash functions, which require only that Pr[H(x 1 ) = H(x 2 )] = 1/M for all x 6 = y.

Below we present another construction of a pairwise independent family.

Proposition 3 Let F be a finite field. Define the family of functions H = { ha,b : F → F} where each ha,b(x) = ax + b for a, b ∈ F. Then H is pairwise independent.

Proof Sketch: Notice that the graph of the function ha,b(x) is the line with slope a and y- intercept b. Given x 1 6 = x 2 and y 1 , y 2 , there is exactly one line containing the points (x 1 , y 1 ) and (x 2 , y 2 ). Thus, the probability over a, b that ha,b(x 1 ) = y 1 and ha,b(x 2 ) = y 2 equals the reciprocal of the number of lines, namely 1/‖F ‖^2.

This construction uses 2 log |F| random bits, since we have to choose a and b at random from F to

get a function ha,b ← HR. Compare this to |F| log |F| bits required to choose |F| fully independent values from F, and (log |F|)^2 bits for repeating the construction of Proposition 1 for each output bit.

large, and thus it is infeasible to even write down a truly random hash function. Thus, it would preferable to show that some explicit family of hash function works for the application with similar performance. In many cases, it can be shown that pairwise independence (or k-wise independence, as discussed below) suffices.

6 Randomness-Efficient Error Reduction and Sampling

Suppose we have a BPP algorithm for a language L that has a constant error probability. We want to reduce the error to 2−k. We have already seen that using O(k) independent repetitions, we can reduce the error of a BPP algorithm to 2−k^ (using a Chernoff Bound). If the algorithm originally used m random bits, then we need O(km) random bits after error reduction. Here we will see how to reduce the number of random bits required for error reduction by doing only pairwise independent repetitions.

To analyze this, we will need an analogue of the Chernoff Bound that applies to sums of pairwise independent random variables. This follows from Chebychev’s Inequality. For a random variable X with expectation μ, recall that its variance is defined to be Var[X] = E[(X − μ)^2 ] = E[X^2 ] − μ^2.

Lemma 5 (Chebyshev’s Inequality) Let X be a random variable with expectation μ, then

Pr[|X − μ| ≥ ε] ≤

Var[X] ε^2

Proof: Let Y = (X − μ)^2. Then

Pr[|X − μ| ≥ ε] = Pr[(X − μ)^2 ≥ ε^2 ] ≤ E[(

X − μ)^2 ] ε^2

Var[X] ε^2

We now use this to show that sums of pairwise independent random variables are concentrated around their expectation.

Proposition 6 (Pairwise-Independent Tail Inequality) Let X 1 ,... , Xt be pairwise indepen- dent random variables taking values in the interval [0, 1], let X = (

i Xi)/t, and^ μ^ = E[X]. Then

Pr[|X − μ| ≥ ε] ≤

tε^2

Proof: Let μi = E[Xi]. Then

Var[X] = (^) E[(X − μ)^2 ]

=

t^2

E[(

i

(Xi − μi))^2 ]

t^2

i,j

E[(Xi −^ μi)(Xj −^ μj )]

t^2

i

E[(Xi −^ μi)^2 ]^ (by pairwise independence)

t^2

i

Var[Xi]

t

Now apply Chebychev’s Inequality.

While this requires less independence than the Chernoff Bound, notice that error probability de- creases only linearly with t.

Error Reduction. Proposition 6 tells us that if we use t = O(2k) pairwise independent repe- titions, we can reduce the error probability of a BPP algorithm from 1/3 to 2−k. If the original BPP algorithm uses m random bits, then we can do this by choosing h : { 0 , 1 }k^ → { 0 , 1 }m^ at random from a pairwise independent family, and running the algorithm using coin tosses h(x) for all x ∈ { 0 , 1 }k^ This requires O(k + m) random bits.

Number of Repetitions Number of Random Bits Independent Repetitions O(k) O(km) Pairwise Independent Repetitions O(2k) O(k + m)

Note that we have saved substantially on the number of random bits, but paid a lot in the number of repetitions needed. To maintain a polynomial-time algorithm, we can only afford k = O(log n). This setting implies that if we have a BPP algorithm with a constant error that uses m random bits, we have another BPP algorithm that uses O(m + log n) = O(m) random bits and has an error of 1/poly(n). That is, we can go from constant to inverse-polynomial error only paying a constant factor in randomness.

Sampling. Recall the Sampling problem: Given an oracle to a function f : { 0 , 1 }m^ → [0, 1], we want to approximate μ(f ) to within an additive error of ε.

We saw that we can solve this problem with probability 1 − δ by outputting the average of f on a random sample of t = O(log(1/δ)/ε^2 ) points in { 0 , 1 }m, where the correctness follows from the Chernoff Bound. To reduce the number of truly random bits used, we can use a pairwise inde- pendent sample instead. Specifically, taking t = 1/(ε^2 δ) pairwise independent points, we get an error probability of at most δ. To generate t pairwise independent samples of m bits each, we need O(m + log(1/ε) + log(1/δ)) truly random bits.

Number of Samples Number of Random Bits Truly Random Sample O( (^) ^12 log (^1) δ ) O( m 2 log (^1) δ ) Pairwise Independent Repetitions O( (^) ^12 δ ) O(m + log (^1) + log (^1) δ )

k-wise Independence Our definition and construction of pairwise independent functions gener- alizes naturally to k-wise independence for any k.

Pseudorandomness Basic Derandomization Techniques 2, Lecture Notes - Computer Science, Study notes of Number Theory

Related documents

Partial preview of the text

Download Pseudorandomness Basic Derandomization Techniques 2, Lecture Notes - Computer Science and more Study notes Number Theory in PDF only on Docsity!

Lecture 6: Basic Derandomization Techniques II

1 Recap

2 The Method of Conditional Expectations

[

∣∣R

]

3 Pairwise Independence

4 Pairwise Independent Hash Functions

M 2

6 Randomness-Efficient Error Reduction and Sampling

E[(