Perceptron Learning Rule and Winnow Algorithm in On-line Machine Learning - Prof. Dan Roth, Study notes of Computer Science

An overview of the perceptron learning rule and winnow algorithm in on-line machine learning. The perceptron learning rule is a simple algorithm used for binary classification tasks, while winnow is an on-line learning algorithm that can handle multi-class problems. Both algorithms use a weight vector to represent the model and update the weights based on the errors made on new data.

Typology: Study notes

Pre 2010

Uploaded on 03/16/2009

koofers-user-xf4-2
koofers-user-xf4-2 🇺🇸

8 documents

1 / 49

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
On-line Learning CS446-Spring08 1
Linear Functions
f (x) =1 if w1 x1 + w2 x2 +. . . wn xn >= θ
0 Otherwise
{
Disjunctions: y= x1 x3 x5
y= ( 1• x1 + 1• x3 + 1• x5 >= 1)
At least m of n: y= at least 2 of {x1 , x3 , x5}
y= ( 1• x1 + 1• x3 + 1• x5 >=2)
Exclusive-OR: y= (x1 Λx2) v (x1 Λx2)
Non-trivial DNF: y= (x1 Λx2) v (x3 Λx4)
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30
pf31

Partial preview of the text

Download Perceptron Learning Rule and Winnow Algorithm in On-line Machine Learning - Prof. Dan Roth and more Study notes Computer Science in PDF only on Docsity!

On-line Learning

CS446-Spring

Linear Functions

f

(x)

if

w

x

w

x

w

n

x

n

Otherwise

• Disjunctions:

y

x

x

x

y

x

x

x

• At least m of n:

y

= at least 2 of {

x

x

x

y

x

x

x

• Exclusive-OR:

y

= (x

x

) v (

x

x

• Non-trivial DNF:

y

x

x

) v (

x

x

On-line Learning

CS446-Spring

w

x = 0


- - 

w

x =

On-line Learning

CS446-Spring

4

Perceptron learning rule

We learn f:X

{-1,+1}

represented as

f = sgn{w

x)

Where X=

or X=

w

n

{0,1}

n

R

n

R

m m 2 2 1 1

Given Labeled examples:

)}

y

,

(x

),...,

y

,

(x

),

y

,

{(x

Initialize w=

Cycle through all examplesa. Predict the label of instance x to be y’ = sgn{w

x)

b. If y’

y, update the weight vector:

w = w + r y x

(r - a constant, learning rate)

Otherwise, if y’=y, leave weights unchanged.

n

R

On-line Learning

CS446-Spring

5

Footnote About the Threshold

  • On previous slide, Perceptron has no threshold • But we don’t lose generality:

θ

, 1

, w

w

x

x

x

0

x

1

x

x

w

0

x

1

x

θ

x

w

On-line Learning

CS446-Spring

On-line Learning

CS446-Spring

On-line Learning

CS446-Spring

Perceptron learning rule

Initialize w=

Cycle through all examplesa. Predict the label of instance x to be y’ = sgn{w

x)

b. If y’

y, update the weight vector to

w = w + r y x

(r - a constant, learning rate)

Otherwise, if y’=y, leave weights unchanged.

n

R

If x is Boolean, only weights of active features are updated.

1/

x)

exp(w

1

1

to

equivalent

is

0

x

w

On-line Learning

CS446-Spring

Perceptron Learnability

Obviously can’t learn what it can’t represent

  • Only linearly separable functions

Minsky and Papert (1969) wrote an influential bookdemonstrating Perceptron’s representationallimitations

  • Parity functions can’t be learned (XOR) – In vision, if patterns are represented with local features, can’t

represent symmetry, connectivity

Research on Neural Networks stopped for years

Rosenblatt himself (1959) asked,

“What pattern recognition problems can be transformed soas to become linearly separable?”

On-line Learning

CS446-Spring

Perceptron Convergence

  • Perceptron Convergence Theorem:

If there exist a set of weights that are consistent with the (I.e., the data is linearly separable) the perceptron learning algorithm will converge

-- How long would it take to converge?

  • Perceptron Cycling Theorem: If the training data is not linearly

the perceptron learning algorithm will eventually repeat the same set of weights and therefore enter an infinite loop.

-- How to provide robustness, more expressivity?

On-line Learning

CS446-Spring

Maintains a weight vector w

R

N

w

Upon receiving an example

x

R

N

Predicts according to the linear threshold function w•x

Theorem [Novikoff,1963]

Let

(x

; y

),…,: (x

t

; y

t

be a sequence of

labeled examples with

x

i

R

N

x

i

R

and y

i

{-1,1} for all i.

Let u

R

N

γ

> 0 be such that, ||

u

|| = 1 and y

i

u

xi

γ

for all i.

Then Perceptron makes at most

u

R

γ

mistakes on this

example sequence.

(see additional notes)

MarginComplexityParameter

Perceptron: Mistake Bound Theorem

On-line Learning

CS446-Spring

Perceptron for Boolean Functions

  • How many mistakes will the Perceptron algorithms make

when learning a k-disjunction?

  • Try to figure out the bound • Find a sequence of examples that will cause Perceptron to

make O(n) mistakes on k-disjunction on n attributes.

On-line Learning

CS446-Spring

17

Winnow Algorithm

  • The Winnow Algorithm learns Linear Threshold Functions. • For the class of disjunction,

instead of demotion

we can use elimination

.

(demotion)

1)

x

(if

/

w

w

,

x

but

w

0

f(x)

If

)

(promotion

1)

x

(if

2w

w

,

x

w

but

1

f(x)

If

nothing

do

:

mistake

no

If

x

w

iff

1

is

Prediction

w

:

Initialize

i

i

i

i

i

i

i

=

=

=

<

=

=

=

1

n;

θ

θ

θ

θ

On-line Learning

CS446-Spring

19

Winnow - Example

  • Notice that the same algorithm will learn a conjunction over

these variables (

w

=(256,256,0,…32,…256,256) )

hypothesis

(final

version)

on

(eliminati

mistake

ok

variable)

good

each

(for

log(n/2)

mistake

mistake

mistake

ok

ok

ok

Initialize

1024

1023

2

1

w

w

x

w

w

x

w

w

w

x

w

w

x

w

w

x

w

w

x

w

w

x

w

w

x

w

w x x x x f

On-line Learning

CS446-Spring

20

Winnow - Mistake Bound

Claim

: Winnow makes O(k log n) mistakes on k-disjunctions

u - # of mistakes on positive examples (promotions)v - # of mistakes on negative examples (demotions)

(demotion)

1)

x

(if

/

w

w

,

x

but

w

0

f(x)

If

)

(promotion

1)

x

(if

2w

w

,

x

w

but

1

f(x)

If

nothing

do

:

mistake

no

If

x

w

iff

1

is

Prediction

w

:

Initialize

i

i

i

i

i

i

i

=

=

=

<

=

=

=

1

n;

θ

θ

θ

θ