



Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Testing vs Learning, Learning Parities, Goldreich-Levin Theorem, Goldreich-Levin Algorithm, Cryptographic Applications
Typology: Slides
1 / 6
This page cannot be seen from the preview
Don't miss anything!




Analysis of Boolean Functions (CMU 18-859S, Spring 2007)
Feb. 6, 2007 Lecturer: Ryan O’Donnell Scribe: Karl Wimmer
In this lecture we make the jump from testing properties of functions to learning functions. The first algorithm that we will see is an algorithm of Goldreich and Levin, which was originally developed for a cryptographic application and later applied to learning. This algorithm works by finding a function’s large Fourier coefficients.
We first set some goals for learning. First, how do we measure complexity? In testing, we were concerned with how many queries were required. In contrast, learning is an algorithmic problem, so we will be concerned with the running time of the learning algorithm. Second, what is our task? In testing, we had a class of functions C, and given f , our task was to determine whether or not f ∈ C. In learning, we are promised that f ∈ C, and we are required to identify the function in C that f is.
Our first task will be learning parities.
Proposition 2.1 If f : {− 1 , 1 }n^ → {− 1 , 1 } is (0-close to) a parity function, then we can identify f in polynomial time using n queries.
Proof: We simply query f on all strings ei for 1 ≤ i ≤ n, where the string ei is 1 everywhere, except − 1 in entry i. If f (ei) = − 1 , then xi is relevant in the parity function f. 2
Remark 2.2 The above algorithm does not run in linear time, because it takes linear time to write down each query.
So we can learn functions that are parities. With a little more work, we can use local decoding to learn functions close to some parity. Before we do this, we recall the Hoeffding bound:
Theorem 2.3 (Hoeffding bound) If X is a random variable with values in [− 1 , 1] , then the empir- ical average of X after O(log (^1) δ /^2 ) samples is within ± of E[X] with probability 1 − δ.
Proposition 2.4 Suppose f : {− 1 , 1 }n^ → {− 1 , 1 } is -close to χS for some S , where < 14 − c for c > 0_. Then we can identify_ f with probability 1 − δ using O(n log nδ queries.
Proof: Use local decoding on all ei. The probability that the correct answer is returned for one ei is at least 12 + 2c. For each string, repeat the local decoding step O(log(nδ /c^2 ) times, and take the majority answer. By the Hoeffding bound, we get the wrong answer for f (ei) with probability at most (^) nδ. By the union bound, we get the wrong answer for f (ei) for any i with probability at most δ, so we succeed with probability 1 − δ as claimed. 2
Remark 2.5 Notice that the above algorithm cannot be deterministic. For any deterministic ver- sion of local decoding, an adversary could choose a function f where the above algorithm would not work.
To learn functions -far from every parity function for ≥ 14 , we need to introduce the Goldreich-Levin algorithm.
Since we are talking about functions that are not 14 -close to any parity function, we will talk about
correlation rather than closeness. We will say that f has correlation γ with χS if | fˆ (S)| ≥ γ. Notice that in the case where γ is small, there are potentially many S such that f has correlation γ with χS. The following easy bound follows from Parseval’s identity:
Proposition 3.1 | fˆ (S)| ≥ γ for at most (^) γ^12 sets S_._
We will commonly think of γ as a constant. We can think of the task of outputting all S such that | fˆ (S)| ≥ γ as list decoding. We know state the Goldreich-Levin theorem.
Theorem 3.2 (Goldreich-Levin) Given query access to f : {− 1 , 1 }n^ → [− 1 , 1] , given γ , δ > 0 , there is a poly( n ,^1 γ log (^1) δ ) -time algorithm outputs a list L = {S 1 ,... , Sm} such that (1) if fˆ (S) ≥ γ ,
then S ∈ L , and (2) if S ∈ L , then fˆ (S) ≥ γ 2 holds with probability 1 − δ.
The reason that this algorithm is useful is that a function is nearly determined by its large Fourier coefficients. As well, this theorem can be generalized to the case where the f maps into R. The running time of the algorithm is then dependent on the normal parameters as well as maxx(f (x)) − minx(f (x)). To prove this algorithm, we will introduce a powerful tool that we will use very often in the remainder of this course. This tool says that we can “efficiently estimate” any Fourier coefficient we wish.
Lemma 3.3 For any S ⊆ [n] , it is possible to estimate fˆ (S) to within ±η with probability 1 − δ using O(log(^1 δ /η^2 ) queries, with an extra running time factor polynomial in n_._
Definition 4.5 The weight of an indicator string S is
U ∈S
f^ ˆ (U )^2_. We will denote this quantity by_
W (S).
We will make use of the previous corollary to get our second important tool: we can efficiently estimate W (S). We do this by efficiently estimating Ex[FS⊆I (x)^2 ].
Proposition 4.6 Ex[FS⊆I (x)^2 ] can be efficiently estimated.
Proof: E x
[FS⊆I (x)^2 ] = E x
[ fˆx→ I¯ (S)^2 ] = E x
w
[fx→ I¯ (w)wS ])^2 ].
Now instead of taking a square of a expectation, we pick two independent copies of w and write a product of expectations to yield:
E x
w [fx→ I¯ (w)wS ])^2 ] = E x
w,w′
[fx→ I¯ (w)wS fx→ I¯ (w′)w′ S ] = E x,w,w′
[f (w, x)wS f (w′, x)w′ S ].
Where the first equality used the independence of w and w′. Since the random variable inside the expectation can be sampled from and has the range [− 1 , 1], we can use the Hoeffding bound to estimate this expectation. 2 Equipped with this tool, we can state the Goldreich-Levin algorithm:
Algorithm 4.1: GOLDREICHLEVIN(f )
L ← (∗, ∗,... , ∗) for k = 1 to n
for each S ∈ L, S = (a 1 ,... , ak− 1 , ∗,... , ∗)
let Sak = (a 1 ,... , ak− 1 , ak, ∗,... , ∗) for ak = 0, 1 estimate W (Sak ) to within ±γ^2 / 4 with probability at least 1 − δ remove S from L add Sak to L if the estimate of W (Sak ) is at least γ
2 2 for^ ak^ = 0,^1 return (L)
We will now analyze the algorithm. As a first assumption, we will assume that all estimations are accurate. We will later see how to remove this assumption.
Invariant 4.7 After 1 iteration of the algorithm, W (S) ≥ γ
2 4 for all^ S ∈^ L_._
Proof: All estimates are assumed to be correct, and for all S ∈ L, S was placed in L because its estimated weight was at least γ
2 2 , and the estimate is correct to within an additive^
γ^2 4.^2
Invariant 4.8 At any time, |L| ≥ (^) γ^42_. This follows from our previous observation combined with Parseval’s identity._
Since |L| ≥ (^) γ^42 , the algorithm performs at most 2 estimations per set in L at any iteration, and
there are n iterations, the algorithm performs a total of at most (^8) γn 2 estimations.
Invariant 4.9 For any S such that fˆ (S) ≤ γ^2 , there exists S ∈ L such that S ∈ S_. This follows from the correctness of our estimations._
From the above invariants, we can conclude that our algorithm is correct. To remove our assumption, given δ > 0 , we will define δ′^ = (^8) n/γδ 2 , and perform each estimation
with confidence 1 − δ′. By the union bound, if the algorithm performs (^8) γn 2 estimations, they are all correct with probability at least 1 − δ, so the algorithm is correct with probability at least 1 − δ. The total running time is dominated by the estimations. There are at most (^8) γn 2 estimations, and
each takes O(log(^1 δ /γ^2 ) samples to estimate, so the overall running time is poly(n, (^1) γ ) log(^1 δ ). Notice that, since we are estimating the weight of each set before it is added to L, we get property (2) from the Goldreich-Levin theorem for free from this algorithm.
We will discuss some of the original applications of the Goldreich-Levin theorem to cryptography. We will require a few definitions.
Definition 5.1 f : { 0 , 1 }n^ → { 0 , 1 } is a γ(n) -one way permutation if (1) f is a permutation, (2) f is deterministic poly-time computable, and (3) for any probabilistic poly-time algorithm A , Prx,A′s randomness[A(f (x)) = x] < γ(n).
Remark 5.2 In most applications, it is desired to have a family of one way permutations, one defined for each input length.
In the definition, we can replace “permutation” with other terms, such as the more general “function” or the more specific “trapdoor permutation,” where a trapdoor permutation is a permu- tation where there exists a short piece of information that allows for easy inversion of f.
Example 5.3 (RSA cryptosystem) Pick N to be the product of two large randomly chosen primes p , q_. Pick a random_ e from Z∗ N , the group of integers relatively prime to N under mod- N multipli- cation. Then x → xe^ is a trapdoor permutation of Z∗ N_._
Remark 5.4 Although elements of Z∗ N are not strings, we can massage them to be strings without too much difficulty.
Remark 5.5 The factorization of N is a trapdoor for the above example.
Definition 5.6 A poly-time computable function B : { 0 , 1 }n^ → { 0 , 1 } is a γ(n) -hardcore pred- icate for f if for all probabilistic polynomial time algorithms A , Prx,A′s randomness[A(f (x)) = B(x)] < 12 + γ(n).