Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Machine Learning: Classification Models & Rosenblatt's Perceptron Algorithm - Prof. Gregor, Study notes of Computer Science

University of Colorado Boulder (CU Boulder)Computer Science

Prof. Gregory Grudic

These lecture notes cover the introduction to classification models in machine learning, including generative models like fisher's linear discriminative analysis and gaussian mixture models, as well as discriminative models like rosenblatt's preceptron learning algorithm. The notes also discuss nonlinear extensions and binary classification.

Typology: Study notes

Pre 2010

Uploaded on 02/10/2009

koofers-user-0sg-1 🇺🇸

(1)

10 documents

1 / 9

This page cannot be seen from the preview

Don't miss anything!

Greg Grudic Machine Learning 1

Introduction to Classification

Greg Grudic

Greg Grudic Machine Learning 2

Today’s Lecture Goals

• Introduction to classification

• Generative Models

– Fisher (Linear Discriminative Analysis)

– Gaussian Mixture Models

• Discriminative Models

– Rosenblatt’s Preceptron Learning Algorithm

• Nonlinear Extensions

Greg Grudic Machine Learning 3

Last Week: Learning Regression

Models

• Collect Training data

• Build Model: stock value = F(feature space)

• Make a prediction

Feature (input) Space

Stock

Value *

***** *

Greg Grudic Machine Learning 4

This Class: Learning Classification

Models

• Collect Training data

• Build Model: happy = F(feature space)

• Make a prediction

High

Dimensional

Feature (input)

Space

Discover Study notes of Computer Science University of Colorado Boulder (CU Boulder)

Partial preview of the text

Download Machine Learning: Classification Models & Rosenblatt's Perceptron Algorithm - Prof. Gregor and more Study notes Computer Science in PDF only on Docsity!

Greg Grudic Machine Learning 1

Introduction to Classification

Greg Grudic

Greg Grudic Machine Learning 2

Today’s Lecture Goals

Introduction to classification
Generative Models
- Fisher (Linear Discriminative Analysis)
- Gaussian Mixture Models
Discriminative Models
- Rosenblatt’s Preceptron Learning Algorithm
Nonlinear Extensions

Last Week: Learning Regression

Models

Collect Training data
Build Model: stock value = F(feature space)
Make a prediction

Feature (input) Space

Stock

Value

__*

_ _

__*

_ _

__*

_ _

__*

This Class: Learning Classification

Models

Collect Training data
Build Model: happy = F(feature space)
Make a prediction

High

Dimensional

Feature (input)

Space

Greg Grudic Machine Learning 5

Binary Classification

A binary classifier is a mapping from a set of d

inputs to a single output which can take on one of

TWO values

In the most general setting
Specifying the output classes as -1 and +1 is

arbitrary!

Often done as a mathematical convenience

{ }

inputs:

output: 1, 1

y

x \

Greg Grudic Machine Learning 6

A Binary Classifier

Classification

Model

{ }

ˆ y ∈ −1, + 1

Given learning data:

1 1

, ,..., ,

N N

x y x y

A model is constructed:

( )

M x

The Learning Data

Learning algorithms don’t care where the data

comes from!

Here is a toy example from robotics…
- Inputs from two sonar sensors:
- Classification output:
  - Robot in Greg’s office: y = +
  - Robot NOT in Greg’s office: y = -

sensor 1:

sensor 2:

∈

Classification Learning Data…

…

Example 4

Example 3

Example 2

Example 1

… … …

0.018504 0.76037 -

0.8913 0.43291 1

0.23114 0.4235 -

0.95013 0.58279 1

x

y

Greg Grudic Machine Learning 13

Linear Separating Hyper-Planes

The Model:
Where:
The decision boundary:

0 ( 1 )

( ) sgn ,...,

y M β β β

x x

[ ]

1 if 0

sgn

1 otherwise

A

( ) 0 1

ˆ ˆ ˆ

,..., 0

β + β β x =

Greg Grudic Machine Learning 14

Linear Separating Hyper-Planes

The model parameters are:
The hat on the betas means that they are

estimated from the data

In the class notes… Sometimes the hat will be

there and sometimes it won’t!

Many different learning algorithms have

been proposed for determining

( ) 0 1

ˆ ˆ ˆ

, ,...,

β β β

( 0 1 )

ˆ ˆ ˆ

, ,...,

β β β

Rosenblatt’s Preceptron Learning

Algorithm

Dates back to the 1950’s and is the

motivation behind Neural Networks

The algorithm:
- Start with a random hyperplane
- Incrementally modify the hyperplane such that

points that are misclassified move closer to the

correct side of the boundary

Stop when all learning examples are correctly

classified

( 0 1 )

β β β

Rosenblatt’s Preceptron Learning

Algorithm

The algorithm is based on the following property:
- Signed distance of any point to the boundary is

proportional to

Therefore, if is the set of misclassified

learning examples, we can push them closer to the

boundary by minimizing the following

( ) ( ) 0 1 ( 0 1 )

d i d i

i M

D β β β y β β β

∈

0 ( 1 )

ˆ ˆ ˆ

,...,

β + β β x

Greg Grudic Machine Learning 17

Rosenblatt’s Minimization Function

This is classic Machine Learning!
First define a cost function in model

parameter space

Then find an algorithm that modifies

such that this cost function is minimized

One such algorithm is Gradient Descent

( ) 0 1 0

d i k ik

i M k

D β β β y β βx

∈ =

( 0 1 )

β β β

Greg Grudic Machine Learning 18

Gradient Descent

w0 w

E[w]

The Gradient Descent Algorithm

( ) 0 1

ˆ ˆ ˆ

, ,...,

ˆ ˆ

i i

D β β β

β β ρ

∂

← −

∂

Where the learning rate is defined by: ρ > 0

The Gradient Descent Algorithm for

the Perceptron

0 0

1 1 1

i i

i id

d d

y x

β β

( ) 0 1

ˆ ˆ ˆ , ,...,

i M

β β β

β ∈

∂

= −

∂

∑

( ) 0 1

ˆ ˆ ˆ , ,...,

, 1,...,

i ij

i M j

y x j d

β β β

β ∈

∂

= − =

∂

∑

Greg Grudic Machine Learning 25

What about Nonlinear Data?

Data that is not linearly separable is called

nonlinear data

Nonlinear data can often be mapped into a

nonlinear space where it is linearly

separable

Greg Grudic Machine Learning 26

Nonlinear Models

The Linear Model:
The Nonlinear (basis function) Model:
Examples of Nonlinear Basis Functions:

ˆ ( ) sgn

i i

y M β βx

( )

ˆ ( ) sgn

i i

y M β β φ

x x

( ) ( ) ( ) ( ) ( )

2 2

1 1 2 2 3 1 2 4 55

φ x = x φ x = x φ x = x x φ x =sin x

Linear Separating Hyper-Planes In

Nonlinear Basis Function Space

i i

β β φ

i i

β β φ

∑

i i

β β φ

∑

y = − 1

y = + 1

An Example

-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1

-0.

x 1

: y=+

: y=-

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

φ 1

= x 1

= x

: y=+

: y=-

Greg Grudic Machine Learning 29

Kernels as Nonlinear

Transformations

Polynomial
Sigmoid
Gaussian or Radial Basis Function (RBF)

, tanh ,

, exp

k

i j i j

K q

K

x x x x

Greg Grudic Machine Learning 30

The Kernel Model

ˆ ˆ

ˆ ( ) sgn ,

i i

y M x β βK x x

 

  = = +

 

 

1 1

, ,..., ,

N N

Training Data: x y x y

The number of basis functions equals

the number of training examples!

Unless some of the beta’s get set to zero…

Gram (Kernel) Matrix

1 1 1

, ,

N N N

K K

 

 











 

   

 

x x x x

…

%

1 1

, ,..., ,

N N

x y x y

Training Data:

Properties:

•Positive Definite Matrix

•Symmetric

•Positive on diagonal

•N by N

Picking a Model Structure?

How do you pick the Kernels?
- Kernel parameters
These are called learning parameters or

hyperparamters

Two approaches choosing learning paramters
- Bayesian
  - Learning parameters must maximize probability of correct

classification based on prior biases

Frequentist
- Use validation data
More on learning parameter selection later

Machine Learning: Classification Models & Rosenblatt's Perceptron Algorithm - Prof. Gregor, Study notes of Computer Science

Related documents

Partial preview of the text

Download Machine Learning: Classification Models & Rosenblatt's Perceptron Algorithm - Prof. Gregor and more Study notes Computer Science in PDF only on Docsity!

Introduction to Classification

Greg Grudic

Today’s Lecture Goals

Feature (input) Space

Stock

Value

High

Dimensional

Feature (input)

Space

Binary Classification

inputs:

output: 1, 1

y

x \

A Binary Classifier

Classification

Model

Given learning data:

A model is constructed:

The Learning Data

Classification Learning Data…

x

x

y

Linear Separating Hyper-Planes

A

A

Linear Separating Hyper-Planes

estimated from the data

been proposed for determining

motivation behind Neural Networks

proportional to

parameter space

such that this cost function is minimized

Gradient Descent

The Gradient Descent Algorithm

What about Nonlinear Data?

nonlinear data

nonlinear space where it is linearly

separable

Nonlinear Models

An Example

Kernels as Nonlinear

Transformations

, tanh ,

, exp

k

i j i j

i j i j

i j i j

K q

K

K

x x x x

x x x x

x x x x

The Kernel Model

The number of basis functions equals

the number of training examples!

Gram (Kernel) Matrix



%

Training Data:

Properties:

•Positive Definite Matrix

•Symmetric

•Positive on diagonal

•N by N

Picking a Model Structure?