Learning Theory, Exercises - Computer Science, Exercises of Computer Architecture and Organization

Total influence of DNFs, Unbiased Function,s Weak Learning, Bent Functions, Low degree, Algorithm Hypothesis , Exercises, Problem set

Typology: Exercises

2010/2011

Uploaded on 10/07/2011

rolla45
rolla45 🇺🇸

4

(6)

133 documents

1 / 2

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Analysis of Boolean Functions CMU 18-859S, Spring 2007
PROBLEM SET 3
Due: Thursday, March 8
Homework policy: I encourage you to try to solve the problems by yourself. However, you may collaborate
as long as you do the writeup yourself and list the people you talked with.
Do 5 out 7 problems
1. Total influence of DNFs. Let fbe computable by a DNF of width w. Show that I(f)2w. For extra
credit, improve on the constant 2.
2. Unbiased functions can’t be that correlation-immune. Suppose f:{−1,1}n {−1,1}is dth order
correlation-immune (see Homework #1) but E[f]6= 0. Show that d < (2/3)n. (The example from class,
(x1 · ·· x(2/3)n)(xn/3+1 · ·· xn), shows that this is tight.) (Hint: f21.)
3. Weak learning. Aweak learner is a learning algorithm that does not work for every accuracy parameter
, only for some < 1
2. Specifically, we say A γ-weak-learns a class if for target function f, its hypothesis
hsatisfies E[fh]γ(with probability at least 1δ).
Show that if fis computable by a size-sDNF then there is some U[n]with |U| log2(s) + O(1)
such that |ˆ
f(U)| Ω(1/s).
(Given this, one can of course Ω(1/s)-weak-learn size-sDNF in poly(s, n)time using membership
queries. This is the beginning of Jackson’s algorithm.)
4. -biased sets. Let R {−1,1}n. We say that Ris an -biased set if
E
x∼R[xS]
for every 6=S[n]; here x R means that xis drawn uniformly at random from R. We say that R
is efficiently constructible if there is an algorithm which, on input and n, writes down all strings in Rin
deterministic time poly(|R|, n). Later in the course we will show efficiently constructible -biased sets of
cardinality (n/)2.
(a) Assume the existence of such efficiently constructible -biased sets. Given any S[n]and query
access to some f:{−1,1}n {−1,1}, show how to deterministically estimate ˆ
f(S)to within ±in time
poly(kˆ
fk1, n, 1/). You may assume the algorithm knows kˆ
fk1.
1
pf2

Partial preview of the text

Download Learning Theory, Exercises - Computer Science and more Exercises Computer Architecture and Organization in PDF only on Docsity!

Analysis of Boolean Functions CMU 18-859S, Spring 2007

PROBLEM SET 3

Due: Thursday, March 8

Homework policy : I encourage you to try to solve the problems by yourself. However, you may collaborate

as long as you do the writeup yourself and list the people you talked with.

Do 5 out 7 problems

1. Total influence of DNFs. Let f be computable by a DNF of width w. Show that I(f ) ≤ 2 w. For extra

credit, improve on the constant 2.

2. Unbiased functions can’t be that correlation-immune. Suppose f : {− 1 , 1 }

n → {− 1 , 1 } is dth order

correlation-immune (see Homework #1) but E[f ] 6 = 0. Show that d < (2/3)n. (The example from class,

(x 1

⊕ · · · ⊕ x (2/3)n

) ∧ (x n/3+

⊕ · · · ⊕ x n

), shows that this is tight.) (Hint: f

2 ≡ 1 .)

3. Weak learning. A weak learner is a learning algorithm that does not work for every accuracy parameter

, only for some  <

1

2

. Specifically, we say A γ -weak-learns a class if for target function f , its hypothesis

h satisfies E[f h] ≥ γ (with probability at least 1 − δ).

Show that if f is computable by a size-s DNF then there is some U ⊆ [n] with |U | ≤ log 2

(s) + O(1)

such that |

f (U )| ≥ Ω(1/s).

(Given this, one can of course Ω(1/s)-weak-learn size-s DNF in poly(s, n) time using membership

queries. This is the beginning of Jackson’s algorithm.)

4.  -biased sets. Let R ⊂ {− 1 , 1 }

n

. We say that R is an  -biased set if

E

x∼R

[x S

]

for every ∅ 6 = S ⊆ [n]; here x ∼ R means that x is drawn uniformly at random from R. We say that R

is efficiently constructible if there is an algorithm which, on input  and n, writes down all strings in R in

deterministic time poly(|R|, n). Later in the course we will show efficiently constructible -biased sets of

cardinality (n/)

2 .

(a) Assume the existence of such efficiently constructible -biased sets. Given any S ⊆ [n] and query

access to some f : {− 1 , 1 }

n → {− 1 , 1 }, show how to deterministically estimate

f (S) to within ± in time

poly(‖

f ‖ 1 , n, 1 /). You may assume the algorithm knows ‖

f ‖ 1.

(b) In analyzing the spectral norm of DNF in class, we showed that if (I, x) is a random restriction,

then E[‖

f x→

¯ I

1

] ≤ ‖

f ‖ 1

. Show the following much stronger result: For any restriction f x→

¯ I

of f ,

f x→

¯ I

f ‖ 1. Conclude that for any (I, x) and any S ⊆ I we can deterministically estimate FS⊆I (x)

to within ± using queries to f and time poly(‖

f ‖ 1

, n, 1 /).

(With a little bit more work one can similarly estimate E x

[F

S⊆I

(x)] for any S and I; this yields a

deterministic version of the Goldreich-Levin algorithm running in time poly(‖

f ‖ 1

, n, 1 /). In particular,

one gets a polynomial-time deterministic algorithm that can exactly recover O(log n)-depth decision trees

given membership queries.)

5. Bent functions. Compute the maximum possible value of ‖

f ‖ 1

among functions f : {− 1 , 1 }

n → {− 1 , 1 }.

Exhibit a function achieving this maximum. (For the latter, you may assume n is odd or even if you want;

your choice.)

6. The Low Degree Algorithm’s hypothesis.

(a) When doing the Low Degree Algorithm with a fixed d and , for each |S| ≤ d we used an independent

batch of random examples to estimate

f (S). Show that one can in fact first draw a single multiset E of

random examples (x, f (x)) of cardinality poly(n

d , 1 /) · log(1/δ), and then with probability at least 1 − δ

have that (

f (S) −

f (S))

2 ≤ /n

d for every |S| ≤ d, where

f (S) := avg

(x,f (x))∈E

{f (x)xS }.

(b) Show that if we use this version of the Low Degree Algorithm, our final hypothesis h : {− 1 , 1 }

n →

{− 1 , 1 } is of the form

h(y) = sgn

(x,f (x))∈E

w(∆(y, x)) · f (x)

where w : { 0 , 1 ,... , n} → R is some function, and ∆ denotes Hamming distance. (In other words, the

hypothesis on a given y is equal to a weighted vote over all examples seen, where an example’s weight

depends only on its Hamming distance to y.) Simplify your expression for w as much as you can.

7. Learning via noise sensitivity. Recall the noise sensitivity of f at  from Homework #2, NS 

(f ). Let

C = {f : {− 1 , 1 }

n → {− 1 , 1 } : NSα(f ) ≤ γ}. Show that the class C can be learned under the uniform

distribution from random examples, to accuracy O(γ), in time poly(n

1 /α , 1 /γ).

(E.g., the class of functions such that NS 

(f ) ≤ O(

) is learnable from random examples, to accuracy

, in time n

O(1/

2 )

. You might try to convince yourself that Majority n

is in this class, assuming n  1 /.)