Pseudorandomness Extractors vs Hashing and Expanders, Lecture Notes - Computer Science, Study notes of Number Theory

Prof. Salil Vadhan, Computer Science, Pseudorandomness, Extractors as Hash Functions, strong extractors, Leftover Hash, Extractors vs. Expanders, Dispersers and Vertex Expansion, Harvard, Lecture Notes

Typology: Study notes

2010/2011

Uploaded on 10/26/2011

thecoral
thecoral 🇺🇸

4.5

(30)

395 documents

1 / 5

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
CS225: Pseudorandomness Prof. Salil Vadhan
Lecture 11: Extractors vs. Hashing and Expanders.
March 15, 2007
Based on scribe notes by John Provine.
As mentioned in the previous lecture, we mentioned that extractors have played a unifying role in the
theory of pseudorandomness, through their close connections with a variety of other pseudorandom
objects. In this lecture, we will see two of these connections. Specifically, how by reinterpreting
them appropriately, extractors can be viewed as providing families of hash functions, and as being
a certain type of highly expanding graphs.
1 Extractors as Hash Functions
One of the results we saw last time says that for any subset S[N] of size K, if we choose a
completely random hash function h: [N][M] for MK, then hwill map the elements of
Salmost-uniformly to [M]. Equivalently, if we let Hbe distributed uniformly over all functions
h: [N][M] and Xbe uniform on the set S, then (H, H (X)) is statistically close to (H, U [M]).
Can we use a smaller family of hash functions than the set of all functions h: [N][M]? This
gives rise to the following variant of extractors.
Definition 1 (strong extractors) Extractor Ext : {0,1}n× {0,1}d {0,1}mis a strong (k, ε)-
extractor if for every k-source Xon {0,1}n,(Ud,Ext(X, Ud)) is ε-close to (Ud, Um). Equivalently,
Ext0(x, y) = (y, Ext(x, y )) is a standard (k, ε)-extractor.
The nonconstructive existence proof from last time can be extended to establish the existence of
very good strong extractors.
Theorem 2 For every n, k Nand ε > 0there exists a strong (k, ε)-extractor Ext : {0,1}n×
{0,1}d {0,1}mwith m=k2 log( 1
ε)O(1) and d= log(nk) + 2 log ( 1
ε) + O(1).
Note that the output length mkinstead of mk+d; intuitively a strong extractor needs to
extract randomness that is independent of the seed and thus can only get the kbits from the source.
We see that strong extractors can be viewed as very small families hash functions having the almost-
uniform mapping property mentioned above. Indeed, our first explicit construction of extractors is
obtained by using pairwise independent hash functions.
The Leftover Hash Lemma shows us how to explicitly construct an extractor from a family of
pairwise independent functions H. The extractor uses a random hash function hR
H as its seed
and keeps this seed in the output of the extractor. Thus, the extractor is strong.1
1Recall that a (k, ε)-extractor Ext is strong if Ext(x, y)def
=yExt0(x, y) for some function Ext0.
1
pf3
pf4
pf5

Partial preview of the text

Download Pseudorandomness Extractors vs Hashing and Expanders, Lecture Notes - Computer Science and more Study notes Number Theory in PDF only on Docsity!

CS225: Pseudorandomness Prof. Salil Vadhan

Lecture 11: Extractors vs. Hashing and Expanders.

March 15, 2007

Based on scribe notes by John Provine.

As mentioned in the previous lecture, we mentioned that extractors have played a unifying role in the theory of pseudorandomness, through their close connections with a variety of other pseudorandom objects. In this lecture, we will see two of these connections. Specifically, how by reinterpreting them appropriately, extractors can be viewed as providing families of hash functions, and as being a certain type of highly expanding graphs.

1 Extractors as Hash Functions

One of the results we saw last time says that for any subset S ⊆ [N ] of size K, if we choose a completely random hash function h : [N ] → [M ] for M  K, then h will map the elements of S almost-uniformly to [M ]. Equivalently, if we let H be distributed uniformly over all functions h : [N ] → [M ] and X be uniform on the set S, then (H, H(X)) is statistically close to (H, U[M ]). Can we use a smaller family of hash functions than the set of all functions h : [N ] → [M ]? This gives rise to the following variant of extractors.

Definition 1 (strong extractors) Extractor Ext : { 0 , 1 }n^ × { 0 , 1 }d^ → { 0 , 1 }m^ is a strong (k, ε)- extractor if for every k-source X on { 0 , 1 }n^ , (Ud, Ext(X, Ud)) is ε-close to (Ud, Um). Equivalently, Ext′(x, y) = (y, Ext(x, y)) is a standard (k, ε)-extractor.

The nonconstructive existence proof from last time can be extended to establish the existence of very good strong extractors.

Theorem 2 For every n, k ∈ N and ε > 0 there exists a strong (k, ε)-extractor Ext : { 0 , 1 }n^ × { 0 , 1 }d^ → { 0 , 1 }m^ with m = k − 2 log( (^1) ε ) − O(1) and d = log(n − k) + 2 log( (^1) ε ) + O(1).

Note that the output length m ≈ k instead of m ≈ k + d; intuitively a strong extractor needs to extract randomness that is independent of the seed and thus can only get the k bits from the source.

We see that strong extractors can be viewed as very small families hash functions having the almost- uniform mapping property mentioned above. Indeed, our first explicit construction of extractors is obtained by using pairwise independent hash functions.

The Leftover Hash Lemma shows us how to explicitly construct an extractor from a family of pairwise independent functions H. The extractor uses a random hash function h ← HR as its seed and keeps this seed in the output of the extractor. Thus, the extractor is strong.^1

(^1) Recall that a (k, ε)-extractor Ext is strong if Ext(x, y) def = y ◦ Ext′(x, y) for some function Ext′.

Theorem 3 (Leftover Hash Lemma) If H = {h : { 0 , 1 }n^ → { 0 , 1 }m^ } is a pairwise independent

family where m = k − 2 log( (^1) ε ), then Ext(x, h) def = h(x) is a strong (k, ε)-extractor.

Note that the seed length is d = O(n), i.e., the number of random bits required to choose h ← HR. This is far from optimal; for the purposes of simulating randomized algorithms we would like d = O(log n). However, the output length of the extractor is m = k − 2 log( (^1) ε ), which is optimal up to an additive constant.

Proof: Let X be an arbitrary k-source on { 0 , 1 }n, H as above, and H ← HR. Let d be the the seed length. We show that (H, H(X)) is ε-close to Ud × Um in the following three steps:

  1. We show that the collision probability of (H, H(X)) is close to that of Ud × Um.
  2. We note that this is equivalent to saying that the ` 2 distance between (H, H(X)) and Ud ×Um is small.
  3. Then we deduce that the statistical difference is small, by recalling that the statistical differ- ence equals half of the 1 distance, which can be (loosely) bounded by the 2 distance.

Proof of 1: By definition, CP(H, H(X)) = Pr [(H, H(X)) = (H ′, H′(X′))], where (H′, X′) is inde- pendent of and identically distributed to (H, X). Note that (H, H(X)) = (H ′, H′(X)) if and only if H = H′^ and either X = X′^ or X 6 = X′^ but H(X) = H(X′). Thus

CP(H, H(X)) = CP(H)

CP(X) + Pr

[

H(X) = H(X′) | X 6 = X′

])

D

K

M

1 + ε^2 DM

To see the penultimate inequality, note that CP(H) = 1/D because there are D hash functions, CP(X) ≤ 1 /K because H∞(X) ≥ k, and Pr [H(X) = H(X′) |X 6 = X′] = 1/M by pairwise inde- pendence.

Proof of 2:

‖(H, H(X)) − U[D] × U[M ]‖^2 = CP(H, H(X)) − CP(Ud × Um)

≤ 1 + ε^2 DM

DM

ε^2 DM

Proof of 3: Recalling that the statistical difference between two random variables X and Y is equal to 12 |X − Y | 1 , we have:

∆((H, H(X), Ud × Um) =

|(H, H(X)) − Ud × Um| 1

DM

‖(H, H(X)) − Ud × Um‖

DM

ε^2 DM = ε 2

highly imbalanced. Still, for an optimal extractor, we have M = Θ(ε^2 KD) (because m = k + d − 2 log(1/ε) − Θ(1)), which corresponds to expansion factor A = Θ(ε^2 D). (An optimal disperser actually gives A = Θ(D/ log(1/ε)).) Note this is smaller than the expansion factor of D/2 in Ramanujan graphs and D − O(1) in random graphs; the reason is that those expansion factors are for ‘small’ sets, whereas here we are asking for sets to expand to almost the entire right-hand side.

Now let’s look for a graph-theoretic property that is equivalent to the extraction property. Ext is a (k, ε)-extractor iff for every set S ⊆ [N ] of size K,

∆(Ext(US , U[D]), U[M ]) = max T ⊆[M ]

∣∣Pr [Ext(U S , U[D])^ ∈^ T^

]

− Pr

[

U[M ] ∈ T

]∣∣

∣ ≤ ε,

where US denotes the uniform distribution on S. This inequality may be expressed in graph- theoretic terms as follows. For every set T ⊆ [M ], ∣ ∣∣Pr [Ext(U S , U[D])^ ∈^ T^

]

− Pr

[

U[M ] ∈ T

]∣∣

∣ ≤ ε

∣∣^ e(S, T^ ) |S|D

|T |

M

∣∣ ≤ ε

∣∣^ e(S, T^ ) N D

− μ(S)μ(T )

∣∣ ≤ εμ(S)

Thus, we have:

Proposition 6 Ext is a (k, ε)-extractor iff the corresponding bipartite graph G = ([N ], [M ], E)

with left-degree D has the property that

∣∣ e(S,T ) N D −^ μ(S)μ(T^ )

∣∣ ≤ εμ(S) for every S ⊆ [N ] of size K

and every T ⊆ [M ].

Note that this is very similar to the Expander Mixing Lemma, which states that if a graph G has spectral expansion λ, then for all sets S, T ⊆ [N ] we have ∣ ∣∣ ∣

e(S, T ) N D − μ(T )

∣ ≤^ λ

μ(S)μ(T ).

It follows that if λ

μ(S)μ(T ) ≤ εμ(S) for all S ⊆ [N ] of size K and all T ⊆ [N ], then G gives rise to a (k, ε)-extractor (by turning G into a D-regular bipartite graph with N vertices on each side in the natural way). It suffices for λ ≤ ε ·

K/N for this to work.

We can use this connection to turn our explicit construction of spectral expanders into an explicit construction of extractors. To achieve λ ≤ ε ·

K/N , we can take an appropriate power of a constant-degree expander. Specifically, if G 0 is a D 0 -regular expander on N vertices with bounded second eigenvalue, we can consider the tth power of G 0 , G = Gt 0 , where t = O(log((1/ε)

N/K)) =

O(n − k + log(1/ε)). The degree of G is D = D 0 t = poly(1/λ) = poly(1/ε, N/K). This yields the following result:

Theorem 7 For every n, k ∈ N and ε > 0 , there is an explicit (k, ε)-extractor Ext : { 0 , 1 }n^ × { 0 , 1 }d^ −→ { 0 , 1 }n^ with d = O(n − k + log( (^1) ε )).

Note that the seed length is significantly better than in the construction from pairwise-independent hashing when k is close to n, say k ≥ n − O(log n) (i.e. K = Ω(N/ log N )). The output length is just n, which is much larger than the typical output length for extractors (usually m  n). Using a Ramanujan graph (rather than an arbitrary constant-degree expander), the seed length can be improved to d = n − k + 2 log(1/ε) + O(1), which yields an optimal output length n = k + d − 2 log(1/ε) − O(1).

Another way of proving Theorem 7 is to use the fact that a random step on an expanders decreases the ` 2 distance to uniform, like in the proof of the Leftover Hash Lemma. This analysis shows that we actually get a Renyi-entropy extractor; and thus explains the large seed length d ≈ n − k.

The following table summarizes the main differences between “classic” expanders and extractors.

Expanders Extractors Measured by vertex or spectral expansion Measured by min-entropy/statistical difference Typically constant degree Typically logarithmic or poly-logarithmic degree All sets of size at most K expand All sets of size exactly (or at least) K expand Typically balanced Typically unbalanced, bipartite graphs

Figure 1: Differences between “classic” expanders and extractors