

Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Total influence of DNFs, Unbiased Function,s Weak Learning, Bent Functions, Low degree, Algorithm Hypothesis , Exercises, Problem set
Typology: Exercises
1 / 2
This page cannot be seen from the preview
Don't miss anything!


Due: Thursday, March 8
Homework policy : I encourage you to try to solve the problems by yourself. However, you may collaborate
as long as you do the writeup yourself and list the people you talked with.
Do 5 out 7 problems
1. Total influence of DNFs. Let f be computable by a DNF of width w. Show that I(f ) ≤ 2 w. For extra
credit, improve on the constant 2.
2. Unbiased functions can’t be that correlation-immune. Suppose f : {− 1 , 1 }
n → {− 1 , 1 } is dth order
correlation-immune (see Homework #1) but E[f ] 6 = 0. Show that d < (2/3)n. (The example from class,
(x 1
⊕ · · · ⊕ x (2/3)n
) ∧ (x n/3+
⊕ · · · ⊕ x n
), shows that this is tight.) (Hint: f
2 ≡ 1 .)
3. Weak learning. A weak learner is a learning algorithm that does not work for every accuracy parameter
, only for some <
1
2
. Specifically, we say A γ -weak-learns a class if for target function f , its hypothesis
h satisfies E[f h] ≥ γ (with probability at least 1 − δ).
Show that if f is computable by a size-s DNF then there is some U ⊆ [n] with |U | ≤ log 2
(s) + O(1)
such that |
f (U )| ≥ Ω(1/s).
(Given this, one can of course Ω(1/s)-weak-learn size-s DNF in poly(s, n) time using membership
queries. This is the beginning of Jackson’s algorithm.)
4. -biased sets. Let R ⊂ {− 1 , 1 }
n
. We say that R is an -biased set if
x∼R
[x S
for every ∅ 6 = S ⊆ [n]; here x ∼ R means that x is drawn uniformly at random from R. We say that R
is efficiently constructible if there is an algorithm which, on input and n, writes down all strings in R in
deterministic time poly(|R|, n). Later in the course we will show efficiently constructible -biased sets of
cardinality (n/)
2 .
(a) Assume the existence of such efficiently constructible -biased sets. Given any S ⊆ [n] and query
access to some f : {− 1 , 1 }
n → {− 1 , 1 }, show how to deterministically estimate
f (S) to within ± in time
poly(‖
f ‖ 1 , n, 1 /). You may assume the algorithm knows ‖
f ‖ 1.
(b) In analyzing the spectral norm of DNF in class, we showed that if (I, x) is a random restriction,
then E[‖
f x→
¯ I
1
f ‖ 1
. Show the following much stronger result: For any restriction f x→
¯ I
of f ,
f x→
¯ I
f ‖ 1. Conclude that for any (I, x) and any S ⊆ I we can deterministically estimate FS⊆I (x)
to within ± using queries to f and time poly(‖
f ‖ 1
, n, 1 /).
(With a little bit more work one can similarly estimate E x
S⊆I
(x)] for any S and I; this yields a
deterministic version of the Goldreich-Levin algorithm running in time poly(‖
f ‖ 1
, n, 1 /). In particular,
one gets a polynomial-time deterministic algorithm that can exactly recover O(log n)-depth decision trees
given membership queries.)
5. Bent functions. Compute the maximum possible value of ‖
f ‖ 1
among functions f : {− 1 , 1 }
n → {− 1 , 1 }.
Exhibit a function achieving this maximum. (For the latter, you may assume n is odd or even if you want;
your choice.)
6. The Low Degree Algorithm’s hypothesis.
(a) When doing the Low Degree Algorithm with a fixed d and , for each |S| ≤ d we used an independent
batch of random examples to estimate
f (S). Show that one can in fact first draw a single multiset E of
random examples (x, f (x)) of cardinality poly(n
d , 1 /) · log(1/δ), and then with probability at least 1 − δ
have that (
f (S) −
f (S))
2 ≤ /n
d for every |S| ≤ d, where
f (S) := avg
(x,f (x))∈E
{f (x)xS }.
(b) Show that if we use this version of the Low Degree Algorithm, our final hypothesis h : {− 1 , 1 }
n →
{− 1 , 1 } is of the form
h(y) = sgn
(x,f (x))∈E
w(∆(y, x)) · f (x)
where w : { 0 , 1 ,... , n} → R is some function, and ∆ denotes Hamming distance. (In other words, the
hypothesis on a given y is equal to a weighted vote over all examples seen, where an example’s weight
depends only on its Hamming distance to y.) Simplify your expression for w as much as you can.
7. Learning via noise sensitivity. Recall the noise sensitivity of f at from Homework #2, NS
(f ). Let
C = {f : {− 1 , 1 }
n → {− 1 , 1 } : NSα(f ) ≤ γ}. Show that the class C can be learned under the uniform
distribution from random examples, to accuracy O(γ), in time poly(n
1 /α , 1 /γ).
(E.g., the class of functions such that NS
(f ) ≤ O(
) is learnable from random examples, to accuracy
, in time n
O(1/
2 )
. You might try to convince yourself that Majority n
is in this class, assuming n 1 /.)