Problem Set 4: Analysis of Boolean Functions, Exercises of Computer Architecture and Organization

Problem set 4 for the analysis of boolean functions course at carnegie mellon university in spring 2007. It covers topics such as orthogonal decomposition, logarithmic sobolev inequality, ϵ-biased sets, and hypercontractivity theorem. The assignment includes six problems, and students are required to complete four of them.

Typology: Exercises

2010/2011

Uploaded on 10/07/2011

rolla45
rolla45 🇺🇸

4

(6)

133 documents

1 / 4

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Analysis of Boolean Functions CMU 18-859S, Spring 2007
PROBLEM SET 4
Due: Tuesday, April 3, beginning of class
Homework policy: I encourage you to try to solve the problems by yourself. However, you may collaborate
as long as you do the writeup yourself and list the people you talked with.
Do 4 out of 6.
1. Orthogonal decomposition. Given any f:{−1,1}nR, consider fS:{−1,1}nRdefined
by fS=ˆ
f(S)χS. We have f=PS[n]fSas functions, and this “orthogonal decomposition” has the
following three properties:
(i) fS(x)depends only on the coordinates of xin S;
(ii) Ex[fS(x)fT(x)] = 0 if S6=T;
(iii) PTSfT, denoted fS, gives the conditional expectation of fconditioned on the coordinates in S.
(a) Prove property (iii); i.e., fS(x) = E[fxS], where the expectation is over the bits in ¯
S= [n]\S.
(Here the notation is that x {−1,1}n, but in the expression fxS, we only restrict the S-coordinates of f
using the S-bits of x; the ¯
S-bits of xare ignored.)
In the rest of this problem we establish the same kind of decomposition for general real-valued functions
on product probability spaces. Specifically, let Xbe any finite set and let πbe a probability distribution on
X. We think of the n-fold product set Xnas having the product probability distribution induced by π. All
Pr[·],E[·]in what follows refer to this product distribution.
(b) We first make property (iii) hold by fiat: For S[n], we define fS:XnRto be the function
depending only on the coordinates in Sgiving the conditional expectation; i.e., fS(x):=E[fxS], where
the expectation is over the product probability distribution on the coordinates outside S. Now given this
definition, explicitly write how we should define the functions fSso that the equations fS=PTSfT
hold. Check also that property (i) holds with your definitions. (Hint: inclusion-exclusion.)
(c) Show that Ex[fS(x)fT(x)] = Ex[f(ST)(x)2], straight from our definition of fS.
(d) Now show property (ii), that Ex[fS(x)fT(x)] = 0 when S6=T. (Hint: write your definitions of
fS,fTfrom (b), and then use (c).)
Remark: This “orthogonal decomposition” of functions fis often a good substitute for Fourier analysis
when the domain is a product probability space other than {−1,1}n.
2. Logarithmic Sobolev Inequality. Consider the Hypercontractive Theorem with q= 2,p= 2 2,
and ρ=12, where [0,1/2]; if we square it, we get
kT12fk2
2 kfk2
22
1
pf3
pf4

Partial preview of the text

Download Problem Set 4: Analysis of Boolean Functions and more Exercises Computer Architecture and Organization in PDF only on Docsity!

Analysis of Boolean Functions CMU 18-859S, Spring 2007

PROBLEM SET 4

Due: Tuesday, April 3, beginning of class

Homework policy : I encourage you to try to solve the problems by yourself. However, you may collaborate as long as you do the writeup yourself and list the people you talked with.

Do 4 out of 6.

1. Orthogonal decomposition. Given any f : {− 1 , 1 }n^ → R, consider f S^ : {− 1 , 1 }n^ → R defined by f S^ = fˆ (S)χS. We have f =

S⊆[n] f^ S (^) as functions, and this “orthogonal decomposition” has the

following three properties: (i) f S^ (x) depends only on the coordinates of x in S; (ii) Ex[f S^ (x)f T^ (x)] = 0 if S 6 = T ; (iii)

T ⊆S f^

T (^) , denoted f ≤S (^) , gives the conditional expectation of f conditioned on the coordinates in S.

(a) Prove property (iii); i.e., f ≤S^ (x) = E[fx→S ], where the expectation is over the bits in S¯ = [n] \ S. (Here the notation is that x ∈ {− 1 , 1 }n, but in the expression fx→S , we only restrict the S-coordinates of f using the S-bits of x; the S¯-bits of x are ignored.)

In the rest of this problem we establish the same kind of decomposition for general real-valued functions on product probability spaces. Specifically, let X be any finite set and let π be a probability distribution on X. We think of the n-fold product set Xn^ as having the product probability distribution induced by π. All Pr[·], E[·] in what follows refer to this product distribution.

(b) We first make property (iii) hold by fiat: For S ⊆ [n], we define f ≤S^ : Xn^ → R to be the function depending only on the coordinates in S giving the conditional expectation; i.e., f ≤S^ (x) := E[fx→S ], where the expectation is over the product probability distribution on the coordinates outside S. Now given this definition, explicitly write how we should define the functions f S^ so that the equations f ≤S^ =

T ⊆S f^ T

hold. Check also that property (i) holds with your definitions. (Hint: inclusion-exclusion.)

(c) Show that Ex[f ≤S^ (x)f ≤T^ (x)] = Ex[f ≤(S∩T^ )(x)^2 ], straight from our definition of f ≤S^.

(d) Now show property (ii), that Ex[f S^ (x)f T^ (x)] = 0 when S 6 = T. (Hint: write your definitions of f S^ , f T^ from (b), and then use (c).)

Remark: This “orthogonal decomposition” of functions f is often a good substitute for Fourier analysis when the domain is a product probability space other than {− 1 , 1 }n.

2. Logarithmic Sobolev Inequality. Consider the Hypercontractive Theorem with q = 2, p = 2 − 2 , and ρ =

1 − 2 , where  ∈ [0, 1 /2]; if we square it, we get

‖T√ 1 − 2 f ‖^22 ≤ ‖f ‖^22 − 2 

for any f : {− 1 , 1 }n^ → R.

(a) Show that we have equality at  = 0. Explain why we can now conclude that ∂ ∂ ‖T√ 1 − 2 f ‖^22

=

‖f ‖^22 − 2 

=

(b) Show that ∂ ∂ ‖T√ 1 − 2 f ‖^22

= = − 2 I(f ). (c) Show that ∂ ∂ ‖f ‖^22 − 2 

= = − Ent [f 2 ],

where Ent [g] is the functional defined for nonnegative g by Ent [g] = E[g ln g] − E[g] ln E[g]. 1

We conclude that for all f : {− 1 , 1 }n^ → R, Ent [f 2 ] ≤ 2 I(f ).

This is called the “Logarithmic Sobolev Inequality”, or the “Entropy-Energy Inequality”. (Recall we called I(f ) the “energy” of f in Lecture 1.)

(d) Show that if f : {− 1 , 1 }n^ → {T, F} has p = Pr[f = T] ≤ 1 / 2 , then 2 p ln(1/p) ≤ I(f ).

This significantly improves on the Poincar´e Inequality 4 p(1 − p) ≤ I(f ) for small p.

3.  -biased sets. For every positive integer k, there is a field F 2 k with exactly 2 k^ elements. There is a natural way of encoding the names of the field elements as k-bit strings, enc : F 2 k → Fk 2 , and this encoding has the property that enc(x+y) = enc(x)+enc(y) for all x, y ∈ F 2 k and also enc(0) = (0,... , 0). Further, given enc(x) and enc(y), one can compute enc(xy), enc(x/y), enc(x + y), enc(x − y), in deterministic poly(k) time.^2

(a) Let R denote a random string in Fn 2 , formed as follows: Pick a, b ∈ F 2 k independently and uniformly at random; then let the ith bit of R be 〈enc(ai), enc(b)〉F 2 , where 〈·, ·〉F 2 denotes dot product in F 2. Show that for every nonzero string S ∈ Fn 2 ,

1 2

n 2 k^

≤ ER[〈R, S〉F 2 ] ≤

where in the expectation, we’re taking 〈R, S〉F 2 (which is in F 2 ) and reinterpreting it as a real number. (Hint: every nonzero degree-n polynomial over a field has at most n zeroes.)

(b) As needed for Problem 4 on Homework 3, give efficiently constructible -biased sets for {− 1 , 1 }n of size (n/)^2 , whenever n/ is a power of 2. (^1) 0 ln 0 = 0. (^2) Specifically, it is known that for every k there is an irreducible polynomial p(t) ∈ F 2 [t] of degree k; then we may take F 2 k to be the set of polynomials in F 2 [t] modulo p(t). The function enc maps Pk− 1 i=0 ait i (^) to (a 0 ,... , ak− 1 ). It is known (Shoup, 1990)

that one can deterministically find an irreducible p in time poly(k). (Also, it’s very easy to find one in time 2 O(k)^ which is pretty much good enough for us.)

6. Learning monotone decision trees in “polynomial” time. (a) Let f : {− 1 , 1 }n^ → {− 1 , 1 } be computable by a depth-d decision tree. Show that

∑n i=1 fˆ^ (i)^ ≤ O(

d). (Hint: mimic the proof that Majority maximizes

f (i) for general f ; but take the expectation over a random path first.) Conclude that if f is monotone, I(f ) ≤

DT-depth(f ).

(b) Suppose one has access to random examples from a monotone function f. Give a learning algorithm which on input τ , identifies (w.h.p.) a set J which contains all coordinates i with Infi(f ) ≥ τ. The algo- rithm should run in time poly(n, 1 /τ ) and the set J identified should have size O(1/τ 2 ).

(c) Show that C = {monotone f : DT-depth(f ) ≤ log n} is learnable from random examples only in time nO(1/ (^2) )

. (Hint: use the Main Lemma that implied Friedgut’s Theorem.)