Pseudorandomness Random Walks on Expander, Lecture Notes - Computer Science, Study notes of Number Theory

Prof. Salil Vadhan, Computer Science, Pseudorandomness, Random Walks on Expanders, Rapid Mixing of Random Walks, Hitting Property of Expander Walks, Vector Decomposition, Matrix Decomposition, Chernoff Bound, Harvard, Lecture Notes

Typology: Study notes

2010/2011

Uploaded on 10/26/2011

thecoral
thecoral 🇺🇸

4.5

(30)

395 documents

1 / 6

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
CS225: Pseudorandomness Prof. Salil Vadhan
Lecture 8: Random Walks on Expanders
March 1, 2007
Based on scribe notes by Mihai atra¸scu.
1 Rapid Mixing of Random Walks
From the previous lecture, we know that one way of characterizing an expander graph Gis by
having a bound λon their second eigenvalue, and in fact there exist constant-degree expanders
where λis a constant less than 1. From Lecture 4, we know that this implies that the random walk
on Gconverges quickly from the uniform distribution. Specifically, a walk of length tstarted at any
vertex ends at `2distance at most λtfrom the uniform distribution. Thus after t=O(log N) steps,
the distribution is very close to uniform (e.g. the probability of every vertex is (1 ±.01)/N. Note
that, if Ghas constant degree, the number of random bits invested here is O(t) = O(log N), which
is within a constant factor of optimal; clearly log NO(1) random bits are also necessary sample
an almost uniform vertex. Thus, expander walks give a very good tradeoff between the number of
random bits invested and the ‘randomness’ of the final vertex in the walk. Remarkably, expander
walks give good randomness properties not only for the final vertex in the walk, but also of the
sequence of vertices in the walk. Indeed, in several ways to be formalized below, this sequence of
vertices ‘behaves’ like uniform independent samples of the vertex set.
A canonical application of expander walks is for randomness-efficient error reduction of randomized
algorithms: Suppose we have an algorithm with constant error rate, which uses mrandom bits. Our
goal is to reduce the error to 2k, with a minimal penalty in random bits and time. Independent
repetitions of the algorithm suffers just an O(k) penalty in time, but needs O(km) random bits.
We have already seen that with pairwise independence we can use just O(m+k) random bits,
but the time blows up by O(2k). Expander graphs let us have the best of both worlds, using just
m+O(klg D) random bits, and increasing the time only by O(k). Note that for constant D, the
number of random bits is m+O(k), even better than what pairwise independence gives.
The general approach is to consider an expander graph with vertex set {0,1}m, where each vertex
is associated with a setting of the random bits. We will choose a uniform random vertex v1and
then do a random walk on length t1, visiting vertices v1,...,vt. (Note that unlike the rapid
mixing case, here we start at a uniformly random vertex.) This requires mrandom bits for the
initial choice, and log Dfor each of the t1 steps. For every vertex vion the random walk, we will
run the algorithm with the setting of the random coins vi.
First, we consider the special case of RP algorithms. Thus, we accept if at least one execution of
the algorithm accepts, and reject otherwise. If the input is not in the language, the algorithm never
accepts, so we also reject. If the input is in the language, we want our random walk to hit at least
one vertex which makes the algorithm accept. Let Bdenote the set of “bad” vertices giving bad
coin tosses (which make the algorithm reject). By definition, the density of Bis at most a half.
Thus, our aim is to show that the probability that all the vertices in the walk v1,...,vtare in B
vanishes exponentially fast in t.
1
pf3
pf4
pf5

Partial preview of the text

Download Pseudorandomness Random Walks on Expander, Lecture Notes - Computer Science and more Study notes Number Theory in PDF only on Docsity!

CS225: Pseudorandomness Prof. Salil Vadhan

Lecture 8: Random Walks on Expanders

March 1, 2007

Based on scribe notes by Mihai Pˇatra¸scu.

1 Rapid Mixing of Random Walks

From the previous lecture, we know that one way of characterizing an expander graph G is by having a bound λ on their second eigenvalue, and in fact there exist constant-degree expanders where λ is a constant less than 1. From Lecture 4, we know that this implies that the random walk on G converges quickly from the uniform distribution. Specifically, a walk of length t started at any vertex ends at ` 2 distance at most λt^ from the uniform distribution. Thus after t = O(log N ) steps, the distribution is very close to uniform (e.g. the probability of every vertex is (1 ± .01)/N. Note that, if G has constant degree, the number of random bits invested here is O(t) = O(log N ), which is within a constant factor of optimal; clearly log N − O(1) random bits are also necessary sample an almost uniform vertex. Thus, expander walks give a very good tradeoff between the number of random bits invested and the ‘randomness’ of the final vertex in the walk. Remarkably, expander walks give good randomness properties not only for the final vertex in the walk, but also of the sequence of vertices in the walk. Indeed, in several ways to be formalized below, this sequence of vertices ‘behaves’ like uniform independent samples of the vertex set.

A canonical application of expander walks is for randomness-efficient error reduction of randomized algorithms: Suppose we have an algorithm with constant error rate, which uses m random bits. Our goal is to reduce the error to 2−k, with a minimal penalty in random bits and time. Independent repetitions of the algorithm suffers just an O(k) penalty in time, but needs O(km) random bits. We have already seen that with pairwise independence we can use just O(m + k) random bits, but the time blows up by O(2k). Expander graphs let us have the best of both worlds, using just m + O(k lg D) random bits, and increasing the time only by O(k). Note that for constant D, the number of random bits is m + O(k), even better than what pairwise independence gives.

The general approach is to consider an expander graph with vertex set { 0 , 1 }m^ , where each vertex is associated with a setting of the random bits. We will choose a uniform random vertex v 1 and then do a random walk on length t − 1, visiting vertices v 1 ,... , vt. (Note that unlike the rapid mixing case, here we start at a uniformly random vertex.) This requires m random bits for the initial choice, and log D for each of the t − 1 steps. For every vertex vi on the random walk, we will run the algorithm with the setting of the random coins vi.

First, we consider the special case of RP algorithms. Thus, we accept if at least one execution of the algorithm accepts, and reject otherwise. If the input is not in the language, the algorithm never accepts, so we also reject. If the input is in the language, we want our random walk to hit at least one vertex which makes the algorithm accept. Let B denote the set of “bad” vertices giving bad coin tosses (which make the algorithm reject). By definition, the density of B is at most a half. Thus, our aim is to show that the probability that all the vertices in the walk v 1 ,... , vt are in B vanishes exponentially fast in t.

The case t = 2 follows from the Expander Mixing Lemma given last time. If we choose a random edge in a λ spectral expander, the probability that both endpoints are in a set B is at most μ(B)^2 + λ · μ(B). So if λ  μ(B), then the probability is roughly μ(B)^2 , just like two independent random samples. The case of larger t is given by the following theorem.

Theorem 1 (Hitting Property of Expander Walks) If G is a λ spectral expander, then for any B ⊂ V (G) of density μ, the probability that a random walk (V 1 ,... , Vt) of t steps in G starting in a uniformly random vertex V 1 always remains in B is

P r[V 1 ,... , Vt ∈ B] ≤ (μ + λ · (1 − μ))t

.

Equivalently, a random walk ‘hits’ the complement of B with high probability. Thus, if μ and λ are constants less than 1, then the probability is 2−Ω(t), completing the analysis of the efficient error-reduction algorithm.

Before proving the theorem, we discuss general approaches to analyzing spectral expanders and random walks on them. Typically, the first step is to express the quantities of interest linear- algebraically, involving applications of the random-walk (or adjacency) matrix M to some vectors v. For example, last time when proving the Expander Mixing Lemma, we expressed the fraction of edges between sets S and T as χtS M χT (up to some normalization factor). Then we can proceed in one of the two following ways:

Vector Decomposition Decompose the input vector v as v = v‖^ +v⊥, where v‖^ = (〈v, u〉/〈u, u〉)u is the component of v in the direction of the uniform distribution u and v⊥^ is the component of v orthogonal to u. Then this induces a similar orthogonal decomposition of the output vector M v into M v = M v‖^ + M v⊥^ = (M v)‖^ + (M v)⊥, where M v‖^ = v‖^ and ‖M v⊥‖ ≤ λ · ‖v⊥‖. Thus, from information about how v’s lengths are divided into the uniform and non-uniform components, we deduce information about how M v is divided into the uniform and non- uniform components. This is the approach we took in the proof of the Expander Mixing Lemma.

Matrix Decomposition This corresponds to a different decomposition of the output vector M v that can be expressed in a way that is independent of the decomposition of the input vector v. Specifically, we can write

M v = (1 − λ)v‖^ + (λv‖^ + M v⊥) = (1 − λ)Jv + λEv = ((1 − λ)J + λE)v,

where J the matrix that projects onto direction u and the error matrix E satisfies ‖Ev‖ ≤ ‖v‖. The advantage of this decomposition is that we can apply it even when we have no information about how v decomposes (only its length), and the fact that M is a convex combination of J and E means that we can often treat each of these components separately and then just apply triangle inequality. However, it is less refined than the vector decomposition approach, and sometimes gives weaker bounds. Indeed, if we use it to prove the Expander Mixing Lemma (without decomposing χS and χT ), we would get a slightly worse error term of λ

μ(S)μ(T ) + λμ(S)μ(T ).

Claim 4 ‖P M P ‖ ≤ μ + λ · (1 − μ).

Proof of claim: ‖P M P ‖ = ‖P ((1 − λ)J + λE)P ‖ ≤ (1 − λ)‖P JP ‖ + λ‖P EP ‖ ≤ (1 − λ) · ‖P JP ‖ + λ Thus, we only need to analyze the case of J, the random walk on the complete graph. Given any vector x, let y = xP. Note that ‖y‖ ≤ ‖x‖ and y has at most μN coordinates. Then xP JP = yJP = ((

i

yi)u)P = (

i

yi)uP,

so ‖xP JP ‖ ≤ |

i

yi‖ · ‖uP ‖ ≤

μN · ‖y‖ ·

μ N

≤ μ · ‖x‖.

Thus, ‖P M P ‖ ≤ (1 − λ)μ + λ = μ + λ · (1 − μ). 

So the probability of never leaving B in a t-step random walk is

|uP (M P )t−^1 | 1 ≤

μN · ‖uP (M P )t−^1 ‖, ≤

μN · ‖uP ‖ · ‖P M P ‖t−^1

μN ·

μ N · (μ + λ · (1 − μ))t−^1

≤ (μ + λ · (1 − μ))t

The hitting properties described above suffice for reducing the error of RP algorithms. What about BPP? This is handled by the following.

Theorem 5 (Chernoff Bound for Expander Walks) Let G be a λ spectral expander on N vertices, and let f : [N ] → [0, 1] be any function. Consider a random walk V 1 ,... , Vt in G from a uniform start vertex V 1. Then for any ε > 0

Pr

[∣∣

t

i

f (Vi) − μ(f )

∣∣ > λ^ +^ ε

]

≤ 2 e−Ω(ε (^2) t) .

Note that this is just like the standard Chernoff Bound, except that our additive approximation error increases by λ. Thus, unlike the Hitting Property we proved above, this bound is only useful when λ is sufficiently small (as opposed to bounded away from 1). This can be achieved by taking an appropriate power of the initial expander. However, there is a better Chernoff Bound for Expander Walks, where λ does not appear in the approximation error, but the exponent in the probability of error is Ω((1 − λ)ε^2 t) instead of Ω(ε^2 t). The bound above will suffice for our purposes (where ε is typically a constant, as in error reduction for BPP.)

Proof: Let Xi be the random variable f (Vi), and X =

i Xi. Just like in the standard proof of the Chernoff Bound, we show that the expectation of the moment generating function erX^ =

i e rXi

is not much larger than er^ E[X]^ and apply Markov’s Inequality, for a suitable choice of r. However, here the factors erXi^ are not independent, so the expectation does not commute with the product. Instead, we express E[erX^ ] linear-algebraically as follows. Define a diagonal matrix P whose (i, i)’th entry is erf^ (i). Then, similarly to the hitting proof above, we observe that

E[erX^ ] =^

uP (M P )t−^1

1 =^

u(M P )t

N · ‖u|| · ‖M P ‖t^ ≤ ‖M P ‖t.

To see this, we simply note that each cross-term in the matrix product uP (M P )t−^1 corresponds to exactly one expander walk∏ v 1 ,... , vt, with a coefficient equal to the probability of this walk times

i e f (vi ). Again, we bound

‖M P ‖ ≤ (1 − λ) · ‖JP ‖ + λ · ‖EP ‖.

Since J simply projects onto the uniform direction, we have

‖JP ‖^2 ≤

‖uP ‖^2 ‖u‖^2

=

v (e r·f (v)/N ) 2 ∑ v (1/N^ ) 2

N

v

e^2 rf^ (v)

N

v

1 + 2rf (v) + O(r^2 )

= 1 + 2rμ + O(r^2 )

for r ≤ 1, and thus ‖JP ‖ ≤

1 + 2rμ + O(r^2 ) ≤ 1 + rμ + O(r^2 )

For the error term, we have

‖EP ‖ ≤ ‖P ‖ ≤ er^ = 1 + r + O(r^2 ).

Thus,

‖M P ‖ ≤ (1 − λ)(1 + rμ + O(r^2 )) + λ · (1 + r + O(r^2 )) = 1 + (μ + λ)r + O(r^2 ),

and we have E[erX^ ]^ ≤^ (1 + (μ^ +^ λ)r^ +^ O(r^2 ))t^ ≤^ e(μ+λ)rt+O(r

(^2) t) .

By Markov’s Inequality,

Pr[X ≥ (μ + λ + ε)t] ≤ e−εrt+O(r (^2) t) = e−Ω(ε (^2) t) ,

if we set r = ε/c for a large enough constant c.