Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Learning Boolean Functions: Decision Trees and DNFs, Slides of Computer Architecture and Organization

Carnegie Mellon University (CMU)Computer Architecture and Organization

A lecture note from carnegie mellon university (cmu) 18-859s, spring 2007, covering the topics of learning decision trees and dnfs (disjunctive normal forms) in the context of boolean functions. The lecture, delivered by ryan o’donnell and scribed by suresh purini, discusses the concepts of concentration of functions, learning algorithms, and the relationship between decision trees and dnfs. It also covers the fourier expansion of boolean functions and the learnability of decision trees.

Typology: Slides

2010/2011

Uploaded on 10/07/2011

rolla45 🇺🇸

4

(6)

133 documents

1 / 8

This page cannot be seen from the preview

Don't miss anything!

Analysis of Boolean Functions (CMU 18-859S, Spring 2007)

Lecture 9: Learning Decision Trees and DNFs

Feb. 18, 2007

Lecturer: Ryan O’Donnell Scribe: Suresh Purini

1 Two Important Learning Algorithms

We recall the following definition and two important learning algorithms discussed in previous

lecture.

Definition 1.1 Given a collection Sof subsets of [n], we say f:{−1,1}n→Rhas ǫ-concentration

on S, if P

S /∈S

ˆ

f(S)2≤ǫ.

Theorem 1.2 Let Cbe a class of n-bit functions, such that ∀f∈ C,fis ǫ-concentrated on S=

{S⊆[n]| |S| ≤ d}, then the function class Cis learnable under the uniform distribution to an

accuracy of O(ǫ), with a probability of at least 1−δ, in time poly(|S|,1/ǫ)poly(n) log (1/δ)using

random examples only.

This algorithm is called Low Degree algorithm and was proposed by Linial, Mansour and Nisan

in [3]. Refer theorem 5.4 in lecture notes 8.

Theorem 1.3 Let Cbe a class of n-bit functions, such that ∀f∈ C,fis ǫ-concentrated on some

collection S. Then the function class Cis learnable using membership queries (Goldreich-Levin

Algorithm) in poly(|S |,1/ǫ)poly(n) log (1/δ)time.

This algorithm is called Kushilevitz-Mansour algorithm [2]. Refer corollary 5.5 in lecture notes

8.

2 Learning Decision Trees

A decision tree is a binary tree in which the internal nodes are labeled with variables and the leafs

are labeled with either −1or +1. And the left and right edges corresponding to any internal node

is labeled −1and +1 respectively. We can think of the decision tree as defining a boolean function

in the natural obvious way. For example, the decision tree in the figure 1 defines a boolean function

whose DNF formula is x1x2x3+x1¯x2x4+ ¯x1x2.

Note that, given any boolean function we can come up with a corresponding decision tree.

Let Pbe a path in the decision tree. An example of a path in the figure 1 is P= (x1=

−1, x2= +1, x4=−1).

1

Discover Slides of Computer Architecture and Organization Carnegie Mellon University (CMU)

Partial preview of the text

Download Learning Boolean Functions: Decision Trees and DNFs and more Slides Computer Architecture and Organization in PDF only on Docsity!

Analysis of Boolean Functions (CMU 18-859S, Spring 2007)

Lecture 9: Learning Decision Trees and DNFs

Feb. 18, 2007

Lecturer: Ryan O’Donnell Scribe: Suresh Purini

1 Two Important Learning Algorithms

We recall the following definition and two important learning algorithms discussed in previous

lecture.

Definition 1.1 Given a collection S of subsets of [n] , we say f : {− 1 , 1 }

n

→ R has ǫ -concentration

on S , if

∑

S /∈S

f (S)

2

≤ ǫ.

Theorem 1.2 Let C be a class of n -bit functions, such that ∀f ∈ C , f is ǫ -concentrated on S =

{S ⊆ [n]| |S| ≤ d} , then the function class C is learnable under the uniform distribution to an

accuracy of O(ǫ) , with a probability of at least 1 − δ , in time poly(|S|, 1 /ǫ)poly(n) log (1/δ) using

random examples only.

This algorithm is called Low Degree algorithm and was proposed by Linial, Mansour and Nisan

in [3]. Refer theorem 5.4 in lecture notes 8.

Theorem 1.3 Let C be a class of n -bit functions, such that ∀f ∈ C , f is ǫ -concentrated on some

collection S_. Then the function class_ C is learnable using membership queries (Goldreich-Levin

Algorithm) in poly(|S|, 1 /ǫ)poly(n) log (1/δ) time.

This algorithm is called Kushilevitz-Mansour algorithm [2]. Refer corollary 5.5 in lecture notes

2 Learning Decision Trees

A decision tree is a binary tree in which the internal nodes are labeled with variables and the leafs

are labeled with either − 1 or +1. And the left and right edges corresponding to any internal node

is labeled − 1 and +1 respectively. We can think of the decision tree as defining a boolean function

in the natural obvious way. For example, the decision tree in the figure 1 defines a boolean function

whose DNF formula is x 1 x 2 x 3 + x 1 x¯ 2 x 4 + ¯x 1 x 2.

Note that, given any boolean function we can come up with a corresponding decision tree.

Let P be a path in the decision tree. An example of a path in the figure 1 is P = (x 1 =

− 1 , x 2

= +1, x 4

Figure 1:

Let 1 P

n → { 0 , 1 } be an indicator function for path P. For example,

P

1 if x 1

= − 1 , x 2

= +1, x 4

0 else

Observation 2.1 A boolean function f can be expressed in terms of path functions 1 P

’s, corre-

sponding to various paths in the decision tree of the function f as follows

f (x) =

P aths P

P

(x)f (P )

where f (P ) is the label on the leaf when the function f takes the path P in its decision tree.

Observation 2.2 Let V be the set of variables occurring in a path function 1 P

and d be the cardi-

nality of the set V_. Then the Fourier expansion of_ 1 P

looks like

S⊆V

−d

XS.

It is easy to see the proof of the above observation by noting that the Fourier expansion for the

path function 1 P

, when P = (x 1

= − 1 , x 2

= +1, x 4

= −1), is 1 P

= x 1

x¯ 2

x 4

1

2

1

2

x 1

1

2

1

2

x 2 )(

1

2

1

2

x 4 ).

Proposition 2.3 If f : {− 1 , 1 }

n → {− 1 , 1 } is computable by a depth- d decision tree then

1. Fourier expansion of f has degree at most d i.e.,

|S|>d

f (S)

2 = 0_._

2. All Fourier coefficients are integer multiples of 2

−d

.

3. The number of nonzero Fourier coefficients is at most 4

d .

Proposition 2.11 If f has a decision tree of size s , ||

f|| 1

≤ s_._

Proof:

f|| 1

P aths P

P

f (P )

P aths P

P

≤ s

Proposition 2.12 Given any function f with ||f || 2

2

≤ 1 and ǫ > 0 , S = {S ⊆ [n]||

f (S)| ≥

ǫ

f|| 1

} , then f is ǫ -concentrated on S_. Note that_ |S| ≤

f|| 1

ǫ

2

Proof:

S /∈S

f (S)

2

≤ max S /∈S

f(S)|

[

S /∈S

f (S)|

]

≤ max S /∈S

f(S)|

[

S /∈S

f (S)| +

S∈S

f (S)|

]

ǫ

f|| 1

≤ ǫ

Corollary 2.13 Any class of functions C = {f | ||f || 2

2

≤ 1 and ||

f|| 1

≤ s} is learnable with

random examples in time poly(s,

1

ǫ

Let us now consider functions which are computable by decision trees where nodes branch on

arbitrary parities of variables. Figure 2 contains an example of a function computable by decision

tree on the parity of the various subsets of variables. Another example is parity function which is

computable by a depth- 1 parity decision tree.

Proposition 2.14 If a function f : {− 1 , 1 }

n → {− 1 , 1 } is expressible as a size- s decision tree on

parities, then ||

f|| 1

≤ s_._

Figure 2:

Proof: Let 1 P

be an { 0 , 1 }-indicator function for a path P in the decision tree. Let the path P =

(X

S 1

= b 1

, · · · , X

Sd

= b d

), i.e., we get the path P by taking the edges labeled b 1

, · · · , b d

starting from the root node. We have

P

b 1

X

S 1

b d

X

S d

It can be seen that ||

P

1

= 1. Since f (x) =

P aths P

P

(x)f (P ), we have ||

f|| 1

≤ s. 2

Definition 2.15 An AND of parities is called a coset.

Remark 2.16 If a function f : {− 1 , 1 }

n

→ {− 1 , 1 } is expressible as

s

i=

± 1 P

i

, where Pi ’s are

cosets then ||

f|| 1 ≤ s_._

Remark 2.17 Proposition 2.14 implies that we can learn all parity functions in poly(

1

ǫ

) time.

Observe that we cannot see this result straightforward from the usual decision trees on parity

functions.

Theorem 2.18 [1] If a function f : {− 1 , 1 }

n → {− 1 , 1 } with ||

f|| 1

≤ s , then

f =

2

O(s

4 )

i=

Pi

where Pi ’s are cosets.

3 Learning DNFs

Proposition 3.1 If f has a size- s DNF formula, it is ǫ -close to a width- log(

s

ǫ

) DNF.

Proof: Let (I, X) be a random restriction with ρ =

1

10 w

. We know from Hastad’s switching lemma

f X→

¯ I

has a depth greater than d with a probability less than 2

−d

. Hence the following sum is

nonzero (and less than 1) with a probability less than 2

−d

.

S⊆I,|S|>d

f X→

¯ I

(S)

2

Therefore, we have

−d

≥ E

(X,I)

S⊆I

|S|>d

f X→

¯ I

(S)

2

= E

I

 E

X∈{− 1 , 1 }

|

¯ I|

S⊆I

|S|>d

f X→

¯ I

(S)

2

= E

I

S⊆I

|S|>d

E

X∈{− 1 , 1 }

|¯I|

[

F

S⊆I

(X)

2

]

 (Recall^ FS⊆I (x) =

f x

(S))

= E

I

S⊆I

|S|>d

T ⊆

¯ I

F

S⊆I

(T )

2

= E

I

S⊆I

|S|>d

T ⊆

¯ I

f (S ∪ T )

2

U

f (U)

2

Pr

I

[|U ∩ I| > d]

Suppose |U| ≥ 20 dw, then |U ∩ I| is binomially distributed with mean 20 dwρ = 2d. Using

Chernoff bound, we get that PrI [|U ∩ I| > d] ≤

1

2

, when d ≥ 5. Therefore we have the

U

f(U)

2

Pr

I

[|U ∩ I| > d] ≤ 2

−d

U

|U |≥ 20 dw

f (U)

2

−d

U

|U |≥ 20 dw

f(U)

2

≤ 2

−d+

Remark 3.9 By putting dw = w log (

1

ǫ

) , we get the theorem 3.

Further References Yishay Mansour’s survey paper[4] also contains some of the ideas in this

lecture notes.

References

[1] B. Green and T. Sanders. A quantitative version of the idempotent theorem in harmonic anal-

ysis. ArXiv Mathematics e-prints , Nov. 2006.

[2] E. Kushilevitz and Y. Mansour. Learning decision trees using the fourier spectrum. In STOC

’91: Proceedings of the twenty-third annual ACM symposium on Theory of computing , pages

455–464, New York, NY, USA, 1991. ACM Press.

[3] N. Linial, Y. Mansour, and N. Nisan. Constant depth circuits, fourier transform, and learnabil-

ity. J. ACM , 40(3):607–620, 1993.

[4] Y. Mansour. Learning boolean functions via the fourier transform. In V. Roychowdhury, K.-

Y. Siu, and A. Orlitsky, editors, Theoretical Advances in Neural Computation and Learning.

Kluwer, 1994.

Learning Boolean Functions: Decision Trees and DNFs, Slides of Computer Architecture and Organization

Related documents

Partial preview of the text

Download Learning Boolean Functions: Decision Trees and DNFs and more Slides Computer Architecture and Organization in PDF only on Docsity!

Lecture 9: Learning Decision Trees and DNFs

1 Two Important Learning Algorithms

2 Learning Decision Trees

[

]

[

]

(X

, · · · , X

X

X

± 1 P

3 Learning DNFs

) DNF.

(S)

(S)

= E

 E

(S)

= E

E

[

F

(X)

]

(S))

= E

F

(T )

= E

References