Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Perceptron Algorithm: Weights, Training Methods, and Predictions, Study notes of Chemistry

University of Maryland Chemistry

An in-depth explanation of the perceptron algorithm, including the concept of weights, different training methods, and predictions based on the input-output data. It covers various examples and iterations, highlighting the importance of convergence and the impact of different weights on the decision boundary.

Typology: Study notes

Pre 2010

Uploaded on 02/13/2009

koofers-user-l86 🇺🇸

2.5

(2)

10 documents

1 / 12

This page cannot be seen from the preview

Don't miss anything!

1 percept.mcd

Perceptron -- with "OR" example.

Instructor: Nam Sun Wang

A Threshold Unit (Perceptron)...

•Has either binary (-1 or 1, etc.) or real inputs. xj. (If the input is binary 0/1, change the notation

to -1/1. Otherwise, the weight has absolutely no effect on the 0 input.)

•Sums up weighted inputs. σ=Σwj⋅xj.

•Tests to see if Σwj⋅xj≥a, where a is the threshold. (We can simulate the threshold with x0=1 and

w0=-a, an extra dummy input of unity and a weight of-a. This is also commonly called a bias.

This is equivalent to the constant part or intercept in linear regression.)

•Yields a binary output (true or false, 0 or 1, -1 or 1, hit or miss, etc.) If Σwj⋅xj>a, then 1;

otherwise, 0.

Training Procedure #1. If classification is correct, do nothing. If we get a false positive, set the

new w to w-x. On the other hand, if we get a false negative, set the new w to w+x. For y={-1, 1},

we express this procedure algebraically as:

Δwj..

ytrue xjy y true

To avoid divergence, we update the weights conservatively by taking a fractional step ε instead of a

full step. This is commonly known as the learning rate. We can start training with a small value of ε,

then later increase it to unity near convergence.

Δwj...

εytrue xjy y true

Training Procedure #2 (Steepest Descent Algorithm). We can derive the training procedure by

minimizing the error function over all i training patterns.

Minimize sse( )w E( )w

= 0

N

i

errori

2

= 0

N

i

yif.

xiw2

w

d

dwsse .

2

= 0

N

i

.

yif.

xiw xiGradient of error: ΔE.

2

= 0

N

i

.

f.

xiw yixi

The following is the steepest descent algorithm that searches in the negative gradient direction of

error for updating the jth weight after sequentially presenting each example i.

Δw..

εf.

xiw yixi

where ε is a non-negative scalar learning rate. We can either 1) divide the threshold expression by a

constant without changing the net effect of the eventual expression, e.g., Σwj⋅xj>a for j={1, ..., m} is

transformed to Σ(wj/a)⋅xj>1 or 2) to an un-thresholded form Σ(wj/const)⋅xj>0 for j={0, 1, ..., m}. In

practice, because f is a step function, we can force f(x⋅w) to be simply x⋅w in weight calculation. This

substituted form is more restrictive and is a special case of f(x⋅w)=±1=x⋅w.

Δw..

ε.

xiw yixi

Discover Study notes of Chemistry University of Maryland

Partial preview of the text

Download Perceptron Algorithm: Weights, Training Methods, and Predictions and more Study notes Chemistry in PDF only on Docsity!

Perceptron -- with "OR" example. Instructor: Nam Sun Wang

A Threshold Unit (Perceptron)...

Has either binary (-1 or 1, etc.) or real inputs. x (^) j. (If the input is binary 0/1, change the notation to -1/1. Otherwise, the weight has absolutely no effect on the 0 input.)
Sums up weighted inputs. σ=Σwj⋅xj.
Tests to see if Σwj⋅xj≥a, where a is the threshold. (We can simulate the threshold with x 0 =1 and w 0 =-a, an extra dummy input of unity and a weight of-a. This is also commonly called a bias. This is equivalent to the constant part or intercept in linear regression.)
Yields a binary output (true or false, 0 or 1, -1 or 1, hit or miss, etc.) If Σwj⋅xj >a, then 1; otherwise, 0. Training Procedure #1. If classification is correct, do nothing. If we get a false positive, set the new w to w-x. On the other hand, if we get a false negative, set the new w to w+x. For y={-1, 1}, we express this procedure algebraically as: Δwj y true .x (^) j. y y true

To avoid divergence, we update the weights conservatively by taking a fractional step ε instead of a full step. This is commonly known as the learning rate. We can start training with a small value of ε, then later increase it to unity near convergence. Δwj ε. y true .x (^) j. y y true

Training Procedure #2 (Steepest Descent Algorithm). We can derive the training procedure by minimizing the error function over all i training patterns.

Minimize sse( w) E( w) = 0

N

i

errori^2 = 0

N

i

y (^) i f x (^) i. w 2 w

d d w

sse 2. = 0

N

i

y (^) i f x (^) i. w .x (^) i Gradient of error: ΔE 2. = 0

N

i

f x (^) i. w y (^) i .x (^) i

The following is the steepest descent algorithm that searches in the negative gradient direction of error for updating the jth weight after sequentially presenting each example i. Δw ε .f x (^) i. w y (^) i .x (^) i

where ε is a non-negative scalar learning rate. We can either 1) divide the threshold expression by a constant without changing the net effect of the eventual expression, e.g., Σwj⋅xj >a for j={1, ..., m} is transformed to Σ(w (^) j /a)⋅xj >1 or 2) to an un-thresholded form Σ(w (^) j /const)⋅xj >0 for j={0, 1, ..., m}. In practice, because f is a step function, we can force f(x⋅w) to be simply x⋅w in weight calculation. This substituted form is more restrictive and is a special case of f(x⋅w)=±1=x⋅w. Δw ε .x (^) i .w y (^) i .x (^) i

We can either update the weights immediately after presenting each sample, or we can wait until the entire set of samples have gone through, then calculate the weight change by summing up the errors in that batch. Training Procedure #3 (Linear Regression or Pseudo-Inverse Method). We realize that y is simply a linear combination of x ^ , each weighted by wj. This is exactly the same as in the plain linear regression. As opposed to the above algorithms, this is a batch procedure where we present all training examples simultaneously rather than sequentially.

w x T.x..

(^1) T x y

Of course, as in multiple linear regression, if the independent variables are highly correlated, the inverse of x T⋅x is very unstable or nonexistent.

Example : An OR function.

Number of inputs: j 0 .. 2 Maximum number of training patterns: i 0 .. 10 Two-dimensional inputs: (^) x One-dimensional output: i 1, 1 1 1 1

x (^) i 2, 1 1 1 1

y (^) i 1 1 1 1

Number of training patterns: N last( y ) i 0 ..N Add a dummy input (in lieu of threshold): x (^) i 0, 1 Output function: σ( x w, ) x w. Element-wise notation: σ( x w, ) = 0

j

wj. x (^) j f( x w, ) if σ( x w, ) 0 0 1, , 1 Assign initial weights: wj rnd( 2 ) 1 Present different examples and iterate (hopefully until convergence): k 0 .. 10

w< k (^.^ N^1 ) i^1 >^ w< k (^.^ N^1 ) i^ > y.. i

T< >

x

i f ,

< >T

xT i^ w< k (.^ N^1 )^ i^ > y i Results after iteration:

w =

Example. -- Training Method #2 with x⋅w.

Reset weight: w 0 wj rnd( 2 ) 1 Present different examples and iterate (hopefully until convergence): k 0 .. 10 Δwj ε. x (^) i. wj y (^) i .x (^) i ε 0.

w< k (^.^ N^1 ) i^1 >^ w< k (^.^ N^1 ) i^ > ε. xT^ < >i..

< >T

xT i^ w< k (.^ N^1 )^ i^ > y i Results after iteration:

w =

The last weight: (^) ω w< cols( w ) 1 >

(^22 0 )

0

2

Examples Perceptron Weights

ω=

Prediction for different cases: 0. f( ( 1 1 1 ),ω) = 1 f( ( 1 1 1 ),ω )= 1 ← compare → y=

f( ( 1 1 1 ),ω )= 1 f( ( 1 1 1 ),ω) = 1

Note that the answer is not uniqu e. Many combinations of weights give rise to the same result. Here is another set of weights that work just as well. The next-to-last weight: (^) ω w< cols( w ) 2 >

(^22 0 )

0

2

Examples Perceptron Weights

ω=

Prediction for different cases: 0. f( ( 1 1 1 ),ω) = 1 f( ( 1 1 1 ),ω )= 1 ← compare → y=

f( ( 1 1 1 ),ω )= 1 f( ( 1 1 1 ),ω) = 1

Example. -- Training Method #2 with f(x,w)

Present different examples and iterate (hopefully until convergence): k 0 .. 10

w< k (^.^ N^1 ) i^1 >^ w< k (^.^ N^1 ) i^ > ε. xT^ < >i .f ,

< >T

xT i^ w< k (.^ N^1 )^ i^ > y i Results after iteration:

w =

The last weight: (^) ω w< cols( w ) 1 >

(^22 0 )

0

2

Examples Perceptron Weights

ω=

Prediction for different cases: 0. f( ( 1 1 1 ),ω) = 1 f( ( 1 1 1 ),ω )= 1 ← compare → y=

f( ( 1 1 1 ),ω )= 1 f( ( 1 1 1 ),ω) = 1

Example. -- Training Method #2 with f(x,w) but with batch updating of weights.

Present different examples and iterate (hopefully until convergence): k 0 .. 10

w< k^1 > w< > k^ ε. = 0

N

i

T< >

x

i f ,

< >T

xT i^ w< > k^ y i

Results after iteration:

w =

The last weight: (^) ω w< 10 > (^) ← Same set of weights as in the last problem ω= but converged quickly.

Example. -- Output y is binary in 0/1 instead of -1/1. Training Method #2 with x⋅w.

Maximum number of training patterns: i 0 .. 10 Reset input and output variables: x 0 y 0 Two-dimensional inputs: (^) x One-dimensional output: i 1, 1 1 1 1

x (^) i 2, 1 1 1 1

y (^) i 0 1 1 1

Number of training patterns: N last( y ) i 0 ..N Add a dummy input (in lieu of threshold): x (^) i 0, 1 Output function: σ( x w, ) x w. f( x w, ) if σ( x w, ) 0 0 1 0, , Reset weight: w 0 wj rnd( 2 ) 1 Present different examples and iterate (hopefully until convergence): k 0 .. 10

w< k (^.^ N^1 ) i^1 >^ w< k (^.^ N^1 ) i^ > ε. xT^ < >i..

< >T

xT i^ w< k (.^ N^1 )^ i^ > y i Results after iteration:

w =

The last weight: (^) ω w< cols( w ) 1 >

(^22 0 )

0

2

Examples Perceptron Weights

ω=

Prediction for different cases: 0. f( ( 1 1 1 ),ω) = 1 f( ( 1 1 1 ),ω )= 1 ← compare → y=

f( ( 1 1 1 ),ω )= 1 f( ( 1 1 1 ),ω) = 1^ No good! No good because the approximation of f (x⋅w) with x⋅w is valid only when f(x⋅w)=±1=x⋅w. Now, f={0 or 1} violates this assumption.

Example. -- Output y is binary in 0/1 instead of -1/1. Training Method #2 with f(x,w)

Output function: σ( x w, ) x w. f( x w, ) if σ( x w, ) 0 0 1 0, , Reset weight: w 0 wj rnd( 2 ) 1 Present different examples and iterate (hopefully until convergence): k 0 .. 10

w< k (^.^ N^1 ) i^1 >^ w< k (^.^ N^1 ) i^ > ε. xT^ < >i .f ,

< >T

xT i^ w< k (.^ N^1 )^ i^ > y i Results after iteration:

w =

The last weight: (^) ω w< cols( w ) 1 >

(^22 0 )

0

2

Examples Perceptron Weights

ω=

Prediction for different cases: 0. f( ( 1 1 1 ),ω) = 0 f( ( 1 1 1 ),ω )= 1 ← compare → y=

f( ( 1 1 1 ),ω )= 1 f( ( 1 1 1 ),ω) = 1 Now, the answer is O.K.

Example. -- Output y is binary in 0/1 instead of -1/1. Training Method #3 with Pseudo-Inverse.

ω 0 ω x T.x..

(^1) T x y ω=

(^22 0 )

0

2

Examples Perceptron Weights

Prediction for different cases: f( ( 1 1 1 ),ω) = 1 f( ( 1 1 1 ),ω )= 1 ← compare → y=

f( ( 1 1 1 ),ω )= 1 f( ( 1 1 1 ),ω) = 1^ No good!

Example : An XOR function trained with the batch method #2.

Two-dimensional inputs: (^) x One-dimensional output: i 1, 1 1 1 1

x (^) i 2, 1 1 1 1

y (^) i 1 1 1 1

Present different examples and iterate (hopefully until convergence): k 0 .. 10

w< k^1 > w< > k^ ε. = 0

N

i

T< >

x

i f ,

< >T

xT i^ w< > k^ y i

Results after iteration:

w =

The last weight: (^) ω w< cols( w ) 1 > ω=

(^22 0 )

0

2

Examples Perceptron Weights

Prediction for different cases: f( ( 1 1 1 ),ω) = 1 f( ( 1 1 1 ),ω )= 1 ← compare → y=

f( ( 1 1 1 ),ω )= (^1) No good! f( ( 1 1 1 ),ω) = 1 The next-to-last weight: (^) ω w< cols( w ) 2 > ω=

(^22 0 )

0

2

Examples Perceptron Weights

Prediction for different cases: f( ( 1 1 1 ),ω) = 1 f( ( 1 1 1 ),ω )= 1 ← compare → y=

f( ( 1 1 1 ),ω )= (^1) No good! f( ( 1 1 1 ),ω) = 1

The weights are oscillating. Like the first set of weights above, the second set of weights also fail to yield the correct answers. It is clear that one perceptron with the original unadulterated input variables x<1>^ and x <2>^ alone cannot adequately divide the x <1>-x <2>^ plane into regions where the output is either +1 or -1 such that the patterns match the training data. In other words, one perceptron cannot learn the XOR function, because we need to have at least two straight lines to separate the +1 and -1 outputs, although it maybe possible to do so with a nonlinear curve. Let us generate additional variables consisting of auto- and cross-products of x <1>^ and x <2>.

x^ < > 3 x< > 1 2 x< > 4 x< > 1 .x< > 2 x< > 5 x< > 2 2 j 0 .. 5

Likewise, let us enlarge the weight vector accordingly and a ssign initial weights.

w 0 wj rnd( 2 ) 1

Present different examples and iterate (hopefully until convergence): k 0 .. 10

w< k^1 > w< > k^ ε. = 0

N

i

T< >

x

i f ,

< >T

xT i^ w< > k^ y i

Results after iteration:

w =

The last weight: (^) ω w< cols( w ) 1 > ω T=( 1.068 0.302 0.314 0.167 1.16 0.352)

The following equation describes the curve in the x <1>-x <2>^ plane defined by the weights of the perceptron.

w 5. x 2 2 w 4. x 1 .x 2 w 3. x 1 2 w 2. x 2 w 1. x 1 w 0 ... mark x2 and choose |Symbolic|Solve for| 0

x 2a x 1 ,w (^).^1. 2 w 5

w 4. x 1 w 2 w 4. x 1 w 22 4 w. 5. w 3. x 1 2 w 1. x 1 w 0

x 2b x 1 ,w (^).^1. 2 w 5

w 4. x 1 w 2 w 4. x 1 w 22 4 w. 5. w 3. x 1 2 w 1. x 1 w 0

Perceptron Algorithm: Weights, Training Methods, and Predictions, Study notes of Chemistry

Related documents

Partial preview of the text

Download Perceptron Algorithm: Weights, Training Methods, and Predictions and more Study notes Chemistry in PDF only on Docsity!

N

N

N

N

T< >

< >T

< >T

< >T

N

T< >

< >T

< >T

< >T

N

T< >

< >T

N

T< >

< >T