Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Perceptron Learning Rule and Winnow Algorithm in On-line Machine Learning - Prof. Dan Roth, Study notes of Computer Science

University of Illinois - Urbana-Champaign Computer Science

Prof. Dan Roth

An overview of the perceptron learning rule and winnow algorithm in on-line machine learning. The perceptron learning rule is a simple algorithm used for binary classification tasks, while winnow is an on-line learning algorithm that can handle multi-class problems. Both algorithms use a weight vector to represent the model and update the weights based on the errors made on new data.

Typology: Study notes

Pre 2010

Uploaded on 03/16/2009

koofers-user-xf4-2 🇺🇸

8 documents

1 / 49

This page cannot be seen from the preview

Don't miss anything!

On-line Learning CS446-Spring08 1

Linear Functions

f (x) =1 if w1 x1 + w2 x2 +. . . wn xn >= θ

0 Otherwise

{

• Disjunctions: y= x1 ∨ x3 ∨ x5

y= ( 1• x1 + 1• x3 + 1• x5 >= 1)

• At least m of n: y= at least 2 of {x1 , x3 , x5}

y= ( 1• x1 + 1• x3 + 1• x5 >=2)

• Exclusive-OR: y= (x1 Λx2) v (x1 Λx2)

• Non-trivial DNF: y= (x1 Λx2) v (x3 Λx4)

Discover Study notes of Computer Science University of Illinois - Urbana-Champaign

Partial preview of the text

Download Perceptron Learning Rule and Winnow Algorithm in On-line Machine Learning - Prof. Dan Roth and more Study notes Computer Science in PDF only on Docsity!

On-line Learning

CS446-Spring

Linear Functions

f

(x)

if

w

x

w

x

w

n

x

n

Otherwise

• Disjunctions:

y

x

y

x

• At least m of n:

y

= at least 2 of {

x

y

x

• Exclusive-OR:

y

= (x

x

) v (

x

• Non-trivial DNF:

y

x

) v (

x

On-line Learning

CS446-Spring

w

x = 0

- -

w

x =

On-line Learning

CS446-Spring

4

Perceptron learning rule

We learn f:X

→

{-1,+1}

represented as

f = sgn{w

x)

Where X=

or X=

w

∈

n

{0,1}

n

R

n

R

m m 2 2 1 1 •

Given Labeled examples:

)}

y

,

(x

),...,

y

,

(x

),

y

,

{(x

Initialize w=

∈

Cycle through all examplesa. Predict the label of instance x to be y’ = sgn{w

x)

b. If y’

≠

y, update the weight vector:

w = w + r y x

(r - a constant, learning rate)

Otherwise, if y’=y, leave weights unchanged.

n

R

On-line Learning

CS446-Spring

5

Footnote About the Threshold

On previous slide, Perceptron has no threshold • But we don’t lose generality:

⇔

θ

−

⇔

∀

⇔

, 1

, w

w

x

0

x

1

x

w

0

x

1

x

θ

x

w

On-line Learning

CS446-Spring

On-line Learning

CS446-Spring

On-line Learning

CS446-Spring

Perceptron learning rule

Initialize w=

∈

Cycle through all examplesa. Predict the label of instance x to be y’ = sgn{w

x)

b. If y’

≠

y, update the weight vector to

w = w + r y x

(r - a constant, learning rate)

Otherwise, if y’=y, leave weights unchanged.

n

R

If x is Boolean, only weights of active features are updated.

1/

x)

exp(w

1

to

equivalent

is

0

x

w

On-line Learning

CS446-Spring

Perceptron Learnability

Obviously can’t learn what it can’t represent

Only linearly separable functions

Minsky and Papert (1969) wrote an influential bookdemonstrating Perceptron’s representationallimitations

Parity functions can’t be learned (XOR) – In vision, if patterns are represented with local features, can’t

represent symmetry, connectivity

Research on Neural Networks stopped for years

Rosenblatt himself (1959) asked,

“What pattern recognition problems can be transformed soas to become linearly separable?”

On-line Learning

CS446-Spring

Perceptron Convergence

Perceptron Convergence Theorem:

If there exist a set of weights that are consistent with the (I.e., the data is linearly separable) the perceptron learning algorithm will converge

-- How long would it take to converge?

Perceptron Cycling Theorem: If the training data is not linearly

the perceptron learning algorithm will eventually repeat the same set of weights and therefore enter an infinite loop.

-- How to provide robustness, more expressivity?

On-line Learning

CS446-Spring

Maintains a weight vector w

R

N

w

Upon receiving an example

x

R

N

Predicts according to the linear threshold function w•x

Theorem [Novikoff,1963]

Let

(x

; y

),…,: (x

t

; y

t

be a sequence of

labeled examples with

x

i

R

N

x

i

R

and y

i

∈

{-1,1} for all i.

Let u

∈

R

N

γ

> 0 be such that, ||

u

|| = 1 and y

i

u

xi

≥

γ

for all i.

Then Perceptron makes at most

u

R

γ

mistakes on this

example sequence.

(see additional notes)

MarginComplexityParameter

Perceptron: Mistake Bound Theorem

On-line Learning

CS446-Spring

Perceptron for Boolean Functions

How many mistakes will the Perceptron algorithms make

when learning a k-disjunction?

Try to figure out the bound • Find a sequence of examples that will cause Perceptron to

make O(n) mistakes on k-disjunction on n attributes.

On-line Learning

CS446-Spring

17

Winnow Algorithm

The Winnow Algorithm learns Linear Threshold Functions. • For the class of disjunction,

instead of demotion

we can use elimination

.

(demotion)

1)

x

(if

/

w

,

x

but

w

0

f(x)

If

)

(promotion

1)

x

(if

2w

w

,

x

w

but

1

f(x)

If

nothing

do

:

mistake

no

If

x

w

iff

1

is

Prediction

w

:

Initialize

i

=

←

≥

=

←

<

=

≥

=

1

n;

θ

On-line Learning

CS446-Spring

19

Winnow - Example

Notice that the same algorithm will learn a conjunction over

these variables (

w

=(256,256,0,…32,…256,256) )

hypothesis

(final

version)

on

(eliminati

mistake

ok

variable)

good

each

(for

log(n/2)

mistake

ok

Initialize

1024

1023

2

1

w

x

w

x

w

x

w

x

w

x

w

x

w

x

w

x

w

w x x x x f

On-line Learning

CS446-Spring

20

Winnow - Mistake Bound

Claim

: Winnow makes O(k log n) mistakes on k-disjunctions

u - # of mistakes on positive examples (promotions)v - # of mistakes on negative examples (demotions)

(demotion)

1)

x

(if

/

w

,

x

but

w

0

f(x)

If

)

(promotion

1)

x

(if

2w

w

,

x

w

but

1

f(x)

If

nothing

do

:

mistake

no

If

x

w

iff

1

is

Prediction

w

:

Initialize

i

=

←

≥

=

←

<

=

≥

=

1

n;

θ

Perceptron Learning Rule and Winnow Algorithm in On-line Machine Learning - Prof. Dan Roth, Study notes of Computer Science

Related documents

Partial preview of the text

Download Perceptron Learning Rule and Winnow Algorithm in On-line Machine Learning - Prof. Dan Roth and more Study notes Computer Science in PDF only on Docsity!

Linear Functions

f

(x)

if

w

x

w

x

w

x

Otherwise

• Disjunctions:

y

x

x

x

y

x

x

x

• At least m of n:

y

= at least 2 of {

x

x

x

y

x

x

x

• Exclusive-OR:

y

= (x

x

) v (

x

x

• Non-trivial DNF:

y

x

x

) v (

x

x

w

x = 0

w

x =

n

n

n

n

x

x

x

w

x

x

x

w

n

Obviously can’t learn what it can’t represent

Minsky and Papert (1969) wrote an influential bookdemonstrating Perceptron’s representationallimitations

Research on Neural Networks stopped for years

Rosenblatt himself (1959) asked,

Maintains a weight vector w

R

N

w

Upon receiving an example

x

R

N

Predicts according to the linear threshold function w•x

Theorem [Novikoff,1963]

Let