




Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Deterministic extractors, algorithms that convert weak random sources into independent and unbiased bits. The motivation for extractors, definitions of deterministic extractors and statistical difference, and properties of extractors. It also introduces the concept of seeded extractors and their importance in simulating randomized algorithms with weak sources.
Typology: Study notes
1 / 8
This page cannot be seen from the preview
Don't miss anything!





CS225: Pseudorandomness Prof. Salil Vadhan
March 13, 2007
Based on scribe notes by Vitaly Feldman, Andrei Jorza, and Pavlo Pylyavskyy.
Having spent several lectures on expander graphs, the first major pseudorandom object in this course, we now move on to the second: randomness extractors. We begin by discussing the original motivation for extractors, which was to simulate randomized algorithms with sources of biased and correlated bits. This motivation is still compelling, but extractors have taken on a much wider significance in the years since they were introduced. They have found numerous applications in theoretical computer science beyond this initial motivating one, in areas random from cryptogra- phy to distributed algorithms to metric embeddings. More importantly from the perspective of this course, they have played a major unifying role in the theory of pseudorandomness. Indeed, the the links between the various pseudorandom objects we will study in this course (expander graphs, randomness extractors, list-decodable codes, pseudorandom generators, samplers) were all discovered through extractors and are still best understood through extractors.
Typically, when we design randomized algorithms or protocols, we assume that all algorithms/parties have access to sources of perfect randomness, i.e. bits that are unbiased and completely indepen- dent. However, when we implement these algorithms, the physical sources of randomness to which we have access may contain biases and correlations. For example, we may use low-order bits of the system clock, the user’s mouse movements, or a noisy diode based on quantum effects. While these sources may have some randomness in them, the assumption that the source is perfect is a strong one, and thus it is of interest to try and relax it.
Ideally, what we would like is a compiler that takes any algorithm A that works correctly when fed perfectly random bits Um, and produces a new algorithm A′^ that will work even if it is fed random bits X ∈ { 0 , 1 }n^ that come from a ‘weak’ random source. For example, if A is a BPP algorithm, then we would like A′^ to also run in probabilistic polynomial time. One way to design such compilers is to design a randomness extractor Ext : { 0 , 1 }n^ → { 0 , 1 }m^ such that A(X) ≡ Um.
Von Neumann Sources. A simple version of this question was already considered by von Neu- mann. He looked at sources that consist of identical random boolean variables X 1 , X 2 ,... , Xn ∈ { 0 , 1 } which are independent but biased. That is, for every i, Pr [Xi = 1] = δ for some unknown δ. How can such a source be converted into a source of independent, unbiased bits?
Sources of Independent Bits. Lets now look at a bit more interesting source in which all the variables are still independent but the bias is no longer the same. Specifically, for every i, Pr [Xi = 1] = δi and 0 < δ ≤ δi ≤ 1 − δ. How can we deal with such a source?
Let’s be more precise about the problems we are studying. A source on { 0 , 1 }n^ is simply a random variable X taking values in { 0 , 1 }n. In each of the above examples, there is an implicit class of sources being studied. For example, IndBitsn,δ is the class of sources X on { 0 , 1 }n^ where the bits Xi are independent and satisfy δ ≤ Pr[Xi = 1] ≤ 1 − δ. We could define VNn,δ to be the same with the further restriction that all of the Xi’s are identically distributed, i.e. Pr[Xi = 1] = Pr[Xj = 1] for all i, j.
Definition 1 (deterministic extractors) 1 Let C be a class of sources on { 0 , 1 }n. An ε-extractor for C is a function Ext : { 0 , 1 }n^ → { 0 , 1 }m^ such that for every X ∈ C, Ext(X) is ‘ε-close’ to Um.
Note that we want a single function Ext that works for all sources in the class. This captures the idea that we do not want to assume we know the exact distribution of the physical source we are using, but only that it comes from some class. For example, for IndBitsn,δ, we know that the bits are independent and none are too biased, but not the specific bias of each bit. Note also that we only allow the extractor one sample from the source X. If we want to allow multiple independent samples, then this should be modelled explicitly in our class of sources; ideally we would like to minimize the independence assumptions used.
We still need to define what we mean for the output to be ε-close to Um.
Definition 2 For random variables X and Y taking values in U , their statistical difference (also known as variation distance) is ∆(X, Y ) = maxT ⊆U | Pr[X ∈ T ] − Pr[Y ∈ T ]|. We say that X and Y are -close if ∆(X, Y ) ≤ .
The intuitive understanding would be that any event in X happens in Y with same probability ±. This is really the most natural measure of distance for probability distributions (much moreso than the ` 2 distance we used in the study of random walks). In particular, it satisfies the following natural properties.
We now observe that extractors according to this definition give us the ‘compilers’ we want. (^1) Such extractors are called deterministic or seedless to contrast with the probabilistic or seeded randomness extractors we will see later.
log
Pr [X = x]
H 2 (X) = log
Ex ←RX [Pr [X^ =^ x]]
= log
log
Pr [X = x]
(All log-s are in base 2.)
All the three measures satisfy the following properties:
To illustrate the differences between the three notions, consider a source X such that X = 0n^ with probability 0.99 and X = Un with probability 0.01. Then HSh (X) ≥ 0. 01 n (contribution from the uniform distribution), H 2 (X) ≤ log
< 1 and H∞(X) ≤ log
< 1 (contributions from the constant distribution with probability 0.99). Note that even though X has relatively high Shannon entropy, we cannot expect to extract bits that are close to uniform or carry out any useful randomized computations with one sample from X, because it gives us nothing useful 99% of the time. Thus, we should use the stronger measures of entropy given by H or H∞.
Then why is Shannon entropy so widely used in information theory results? The reason is that such results typically study what happens when you have many independent samples from the source. In such a case, it turns out that the the source is “close” to one where the min-entropy is roughly equal to the Shannon entropy. Thus the distinction between these entropy measures becomes less significant. (Recall that we only allow one sample from the source.) Moreover, Shannon entropy satisfies many nice identities that make it quite easy to work with. Min-entropy and Renyi entropy are much more delicate.
Definition 6 X is a k-source is H∞(X) ≥ k, i.e., if Pr [X = x] ≤ 2 −k.
A typical setting of parameters is k = δn for some fixed δ, e.g., 0.01. We call δ the min-entropy rate. Some different ranges that are commonly studied (and are useful for different applications): k = polylog(n), k = nγ^ for a constant γ ∈ (0, 1), k = δn for a constant δ ∈ (0, 1), and k = n − O(1). The middle two (k = nγ^ and k = δn) are the most natural for simulating randomized algorithms with weak random sources.
Examples of k-sources:
It turns out that flat k-sources are really representative of general k-sources.
Proposition 7 Every k-source is a convex combination of flat k-sources (provided that 2 k^ ∈ N), i.e., X =
piXi with 0 ≤ pi ≤ 1 ,
pi = 1 and all the Xi are flat k-sources.
Proof Sketch: Consider each source on [N ] (recall that N = 2n) as a vector X ∈ RN^. Then X is a k-source if and only if ∀i one has X(i) ∈ [0, 1] so that
X(i) = 1 (condition for probabilities) and ∀i one has X(i) ≤ 2 −k^ (condition for k-source).
The set of all k-sources is a polytope determined by all these vectors, since all these conditions are linear. More precisely, the set of k-sources is the intersection of the hypercube [0, 2 −k^ ]N^ and the hyperplane
X(i) = 1. This is a convex polytope and so any k-source is a convex combination of the vertices of the polytope. The vertices of the polytope are the points that make a maximal subset of the inequalities tight. Since
X(i) = 1, these sources are precisely those where X(i) = 2−k^ for 2 k^ values of i and X(i) = 0 for the remaining values of i. Therefore the vertices are represented by the flat k-sources.
Thus, we can think of any k-source as being obtained by first selecting a flat k-source Xi according to some distribution (given by the pi’s) and then selecting a random sample from Xi. This means that if we can compile probabilistic algorithms to work with flat k-sources, then we can compile them to work with any k-source.
Theorem 11 For any n and k (k ≤ n) and any > 0 there exists a (k, ) - extractor Ext : { 0 , 1 }n^ × { 0 , 1 }d^ → { 0 , 1 }m^ with m = k + d − 2 log( (^1) ) − O(1) and d = log(n − k) + 2 log( (^1) ) + O(1).
One setting of parameters to keep in mind (for our application of simulating randomized algorithms with a weak source) is k = δn, with δ a fixed constant (e.g. δ = 0.01), and a fixed constant (e.g. = 0.01).
Proof: We use probabilistic method to prove the theorem. It suffices for Ext to work for flat k-sources. Choose extractor Ext at random. Then the probability that extractor fails is not more than number of flat k-sources times times the probability Ext fails for fixed flat k-source. By the above proposition, the probability of failure for a fixed flat k-source is at most 2−Ω(KD (^2) ) , since (X, Ud) is a flat (k + d)-source) and m = k + d − 2 log( (^1) ) − O(1). Thus the total failure probability is at most (^) ( N K
(^2) ) ≤
N e K
(^2) ) .
The letter expression is less than 1 if D^2 ≥ 2 log N e K = c(n − k) + c′^ for constants c, c′.This is equivalent to d = log(n − k) + 2 log( (^1) ) + O(1).
It turns out that both bounds (on m and d) are individually tight upto the O(1) terms.
Now we study simulating randomized algorithm having weak random source. Usual randomized algorithm takes input string w and m random bits, and outputs the correct answer with probability at least 1 − γ. Assume now we do not have a source of perfectly random bits. Instead we have a k-source and an extractor, which takes an input from our weak source. We also allow it to take small seed of purely random bits, which as mentioned above, can be viewed as choosing a random extractor from some family. The output of the extractor we feed into our randomized algorithm A instead of purely random bits it took before. Since above we had seed having logarithmic size, we can actually eliminate it just by running through all possible values it can take and ruling my majority vote.
Proposition 12 Let A(w; r) be a randomized algorithm such that A(w; Um ) has error probability at most δ, and let Ext : { 0 , 1 }n^ × { 0 , 1 }d^ → { 0 , 1 }m^ be a (k, ε)-extractor. Define
A′(w; x) = A′(w, x) = maj y∈{ 0 , 1 }d
{A(w, Ext(x, y))}.
Then for every k-source X on { 0 , 1 }n, A′(w; X) has error probability at most 2(γ + ε).
Proof: The probability that A(w, Ext(X, Ud)) is incorrect is not more than probability A(w, Um) is incorrect plus , γ + in particular, according to the defining property of statistical difference. Then the probability that majy A(w, Ext(X, y)) is incorrect is at most 2(γ + ).
Note that the running time slowdown is 2d^ times the running time of Ext. Thus, we want to construct extractors achieving the following three properties: d = O(log n); Ext computable in polynomial time; m = kΩ(1).
The bound on the error probability in Proposition 12 can actually be made exponentially small (say 2−t) by using an extractor that is designed for min-entropy roughly k − t instead of k.
We note that even though seeded extractors suffice for simulating randomized algorithms with only a weak source, they do not suffice for all applications of randomness in theoretical computer science. The trick of eliminating the random seed by enumeration does not work, for example, in cryptographic applications of randomness. Thus the study of deterministic extractors for restricted classes of sources remains a very interesting an active research direction. We, however, will focus on seeded extractors, due to their many applications and their connections to the other pseudorandom objects we are studying.