


Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Problem set 4 for the analysis of boolean functions course at carnegie mellon university in spring 2007. It covers topics such as orthogonal decomposition, logarithmic sobolev inequality, ϵ-biased sets, and hypercontractivity theorem. The assignment includes six problems, and students are required to complete four of them.
Typology: Exercises
1 / 4
This page cannot be seen from the preview
Don't miss anything!



Due: Tuesday, April 3, beginning of class
Homework policy : I encourage you to try to solve the problems by yourself. However, you may collaborate as long as you do the writeup yourself and list the people you talked with.
Do 4 out of 6.
1. Orthogonal decomposition. Given any f : {− 1 , 1 }n^ → R, consider f S^ : {− 1 , 1 }n^ → R defined by f S^ = fˆ (S)χS. We have f =
S⊆[n] f^ S (^) as functions, and this “orthogonal decomposition” has the
following three properties: (i) f S^ (x) depends only on the coordinates of x in S; (ii) Ex[f S^ (x)f T^ (x)] = 0 if S 6 = T ; (iii)
T ⊆S f^
T (^) , denoted f ≤S (^) , gives the conditional expectation of f conditioned on the coordinates in S.
(a) Prove property (iii); i.e., f ≤S^ (x) = E[fx→S ], where the expectation is over the bits in S¯ = [n] \ S. (Here the notation is that x ∈ {− 1 , 1 }n, but in the expression fx→S , we only restrict the S-coordinates of f using the S-bits of x; the S¯-bits of x are ignored.)
In the rest of this problem we establish the same kind of decomposition for general real-valued functions on product probability spaces. Specifically, let X be any finite set and let π be a probability distribution on X. We think of the n-fold product set Xn^ as having the product probability distribution induced by π. All Pr[·], E[·] in what follows refer to this product distribution.
(b) We first make property (iii) hold by fiat: For S ⊆ [n], we define f ≤S^ : Xn^ → R to be the function depending only on the coordinates in S giving the conditional expectation; i.e., f ≤S^ (x) := E[fx→S ], where the expectation is over the product probability distribution on the coordinates outside S. Now given this definition, explicitly write how we should define the functions f S^ so that the equations f ≤S^ =
T ⊆S f^ T
hold. Check also that property (i) holds with your definitions. (Hint: inclusion-exclusion.)
(c) Show that Ex[f ≤S^ (x)f ≤T^ (x)] = Ex[f ≤(S∩T^ )(x)^2 ], straight from our definition of f ≤S^.
(d) Now show property (ii), that Ex[f S^ (x)f T^ (x)] = 0 when S 6 = T. (Hint: write your definitions of f S^ , f T^ from (b), and then use (c).)
Remark: This “orthogonal decomposition” of functions f is often a good substitute for Fourier analysis when the domain is a product probability space other than {− 1 , 1 }n.
2. Logarithmic Sobolev Inequality. Consider the Hypercontractive Theorem with q = 2, p = 2 − 2 , and ρ =
1 − 2 , where ∈ [0, 1 /2]; if we square it, we get
‖T√ 1 − 2 f ‖^22 ≤ ‖f ‖^22 − 2
for any f : {− 1 , 1 }n^ → R.
(a) Show that we have equality at = 0. Explain why we can now conclude that ∂ ∂ ‖T√ 1 − 2 f ‖^22
=
‖f ‖^22 − 2
=
(b) Show that ∂ ∂ ‖T√ 1 − 2 f ‖^22
= = − 2 I(f ). (c) Show that ∂ ∂ ‖f ‖^22 − 2
= = − Ent [f 2 ],
where Ent [g] is the functional defined for nonnegative g by Ent [g] = E[g ln g] − E[g] ln E[g]. 1
We conclude that for all f : {− 1 , 1 }n^ → R, Ent [f 2 ] ≤ 2 I(f ).
This is called the “Logarithmic Sobolev Inequality”, or the “Entropy-Energy Inequality”. (Recall we called I(f ) the “energy” of f in Lecture 1.)
(d) Show that if f : {− 1 , 1 }n^ → {T, F} has p = Pr[f = T] ≤ 1 / 2 , then 2 p ln(1/p) ≤ I(f ).
This significantly improves on the Poincar´e Inequality 4 p(1 − p) ≤ I(f ) for small p.
3. -biased sets. For every positive integer k, there is a field F 2 k with exactly 2 k^ elements. There is a natural way of encoding the names of the field elements as k-bit strings, enc : F 2 k → Fk 2 , and this encoding has the property that enc(x+y) = enc(x)+enc(y) for all x, y ∈ F 2 k and also enc(0) = (0,... , 0). Further, given enc(x) and enc(y), one can compute enc(xy), enc(x/y), enc(x + y), enc(x − y), in deterministic poly(k) time.^2
(a) Let R denote a random string in Fn 2 , formed as follows: Pick a, b ∈ F 2 k independently and uniformly at random; then let the ith bit of R be 〈enc(ai), enc(b)〉F 2 , where 〈·, ·〉F 2 denotes dot product in F 2. Show that for every nonzero string S ∈ Fn 2 ,
1 2
n 2 k^
where in the expectation, we’re taking 〈R, S〉F 2 (which is in F 2 ) and reinterpreting it as a real number. (Hint: every nonzero degree-n polynomial over a field has at most n zeroes.)
(b) As needed for Problem 4 on Homework 3, give efficiently constructible -biased sets for {− 1 , 1 }n of size (n/)^2 , whenever n/ is a power of 2. (^1) 0 ln 0 = 0. (^2) Specifically, it is known that for every k there is an irreducible polynomial p(t) ∈ F 2 [t] of degree k; then we may take F 2 k to be the set of polynomials in F 2 [t] modulo p(t). The function enc maps Pk− 1 i=0 ait i (^) to (a 0 ,... , ak− 1 ). It is known (Shoup, 1990)
that one can deterministically find an irreducible p in time poly(k). (Also, it’s very easy to find one in time 2 O(k)^ which is pretty much good enough for us.)
6. Learning monotone decision trees in “polynomial” time. (a) Let f : {− 1 , 1 }n^ → {− 1 , 1 } be computable by a depth-d decision tree. Show that
∑n i=1 fˆ^ (i)^ ≤ O(
d). (Hint: mimic the proof that Majority maximizes
f (i) for general f ; but take the expectation over a random path first.) Conclude that if f is monotone, I(f ) ≤
DT-depth(f ).
(b) Suppose one has access to random examples from a monotone function f. Give a learning algorithm which on input τ , identifies (w.h.p.) a set J which contains all coordinates i with Infi(f ) ≥ τ. The algo- rithm should run in time poly(n, 1 /τ ) and the set J identified should have size O(1/τ 2 ).
(c) Show that C = {monotone f : DT-depth(f ) ≤ log n} is learnable from random examples only in time nO(1/ (^2) )
. (Hint: use the Main Lemma that implied Friedgut’s Theorem.)