Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Fall 2008 Lecture 4: Machine Learning - Logistic Regression & Perceptron Algorithm by S. A, Assignments of Computer Science

A part of the fall 2008 lecture notes for the machine learning (cs 567) course taught by sofus a. Macskassy at the university of southern california. The notes cover topics such as logistic regression, conditional probability distribution, and the perceptron algorithm. Students are introduced to the concept of learning conditional distributions and the use of logistic functions to estimate probabilities. The lecture also discusses the optimization of logistic regression using gradient descent and the difference between online and batch learning.

Typology: Assignments

2009/2010

Uploaded on 02/24/2010

koofers-user-74z
koofers-user-74z 🇺🇸

10 documents

1 / 44

Toggle sidebar

Related documents


Partial preview of the text

Download Fall 2008 Lecture 4: Machine Learning - Logistic Regression & Perceptron Algorithm by S. A and more Assignments Computer Science in PDF only on Docsity!

Fall 2008 1 Lecture 4 - Sofus A. Macskassy

Machine Learning (CS 567) Lecture 4

Fall 2008

Time: T-Th 5:00pm - 6:20pm

Location: GFS 118

Instructor : Sofus A. Macskassy ([email protected])

Office: SAL 216

Office hours: by appointment

Teaching assistant : Cheol Han ([email protected])

Office: SAL 229

Office hours: M 2-3, W 11-

Class web page:

http://www-scf.usc.edu/~csci567/index.html

Fall 2008 2 Lecture 4 - Sofus A. Macskassy

Administrative: Home work

• Due at start of class on due date.

• 25% off if after class but before 8am next morning

• 50% off if received next day after 8am

• 100% off if not received next day

• If you have a valid excuse, let me know ASAP with proper

documentation. No exceptions.

• Grading compliant policy: if students have problems at

assignment grading, feel free to talk to the instructor/TA for

it. However, even if students only request re-grading one of

the answers, all this assignment will be checked and graded

again. (take the risk of possible losing total points.)

Fall 2008 3 Lecture 4 - Sofus A. Macskassy

Administrative: Final Project

• Groups of 2-4 people

• Deliverables:

  • Research paper
    • Write paper as though submitting to a conference or workshop.
  • Presentation at the last two classes

• Work should cover:

  • Good evaluation technique, compare at least 2 classifiers, findings

should be clearly stated and validated, related work should be

cited.

• Project pre-proposals due on September 23

  • You can interact with me before hand
  • 1-2 paragraphs outlining the problem and what you will do

• Project proposals will due on October 9

  • 1-2 pages detailing what you will do, who are in the group, the

data, the machine learning methods you will look at, etc

Fall 2008 4 Lecture 4 - Sofus A. Macskassy

Project Idea: Applied Learning

Take an interesting dataset Compare several learning approaches for prediction

  • decision trees
  • ANNs
  • instance-based methods
  • SVMs
  • boosting

Fall 2008 5 Lecture 4 - Sofus A. Macskassy

Project Idea: Improvements to

Learning Methods

There are many suggestions on how to improve various learning methods, both in books and in papers. Identify some suggestions and test them empirically.

Fall 2008 6 Lecture 4 - Sofus A. Macskassy

Project Idea: Comparison of Learning

Methods

Identify some learners. Run thorough comparison tests and determine the reasons for their different performances.

Fall 2008 7 Lecture 4 - Sofus A. Macskassy

Text Classification

Easy to get lots of text: web, TREC data,

email (e.g., enron)

Predict topic, authorship, sentiment, style,

affect, attitude.

Fall 2008 8 Lecture 4 - Sofus A. Macskassy

Reinforcement Learning

Can be challenging to do well: generate data or

direct control

• Optimize gas well production

• Object tracking

• “Tag” grid world

• Rover control

• animal behavior experiments

Compare approaches (direct policy search, value

function learning, model-based, …)

Fall 2008 9 Lecture 4 - Sofus A. Macskassy

Active Learning

Take a classic dataset.

Explore the tradeoff between size of training

set and generalization.

Devise schemes for choosing items to “pay”

for labels to maximize accuracy with

minimum cost.

Fall 2008 10 Lecture 4 - Sofus A. Macskassy

Project Methodology

Lots of good ideas for algorithms and domains.

The hard question is: “How will you evaluate it?”

Ultimately, you need to present more than one

algorithm (and perhaps more than one problem)

and you’ll need some way of saying what worked

better.

What’s the gold standard?

Fall 2008 11 CS 567 Lecture 3 - Sofus A. Macskassy

Lecture 4 Outline

• Linear Threshold Units

– Perceptron

– Logistic Regression

Fall 2008 12 CS 567 Lecture 3 - Sofus A. Macskassy

The Unthresholded Discriminant

Function is a Hyperplane

  • The equation g( x ) = w ¢ x is a plane

y^ =

( +1 if g(x) ¸ 0

¡ 1 otherwise

Fall 2008 13 CS 567 Lecture 3 - Sofus A. Macskassy

Alternatively…

     

   

   

0 1 2 10 20 1 10 2 20 1 2

w

w w

w w

g g g

T T T T

 

   

   

 

w x

w w x

w x w x

x x x

 

 

otherwise

if 0

choose

2 1

C

C g x

Fall 2008 14 CS 567 Lecture 3 - Sofus A. Macskassy

Geometry

Fall 2008 15 CS 567 Lecture 3 - Sofus A. Macskassy

Multiple Classes

 

0 0

|

i
T
i i i i

g x w ,w  w x  w

Classes are

linearly separable

  x   x

j
K
j
i
i

g g

C

max

Choose if

Fall 2008 16 CS 567 Lecture 3 - Sofus A. Macskassy

Pairwise Separation

 

0 0

|

ij
T
ij ij ij ij

g x w ,w  w x  w

 

 

 

don' t care otherwise

0 if

0 if

j i ij

C

C

g x

x

x

  0

choose if

  x 

ij i

j i, g

C

Fall 2008 17 CS 567 Lecture 3 - Sofus A. Macskassy

• When learning a classifier, the natural way to

formulate the learning problem is the following:

– Given:

  • A set of N training examples {( x 1 ,y 1 ), ( x 2 ,y 2 ), …, ( x N,yN)}
  • A loss function L

– Find:

  • The weight vector w that minimizes the expected loss on the

training data

• In general, machine learning algorithms apply some

optimization algorithm to find a good hypothesis. In

this case, J is piecewise constant, which makes this

a difficult problem

J(w) =

N

X^ N i=

L(sgn(w ¢ xi); yi):

Machine Learning and Optimization

Fall 2008 18 CS 567 Lecture 3 - Sofus A. Macskassy

Approximating the expected loss by a

smooth function

  • Simplify the optimization problem by replacing the original objective function by a smooth, differentiable function. For example, consider the hinge loss: When y = 1

J~(w) =

1 N X^ N

i=

max(0; 1 ¡ yiw ¢ xi)

Fall 2008 19 Lecture 4 - Sofus A. Macskassy

Optimizing… what?

Hinge loss 0-1 loss

Fall 2008 20 CS 567 Lecture 3 - Sofus A. Macskassy

Optimizing… what?

w 0

Fall 2008 21 Lecture 4 - Sofus A. Macskassy

Optimizing… what?

w 0 J(w 0) w 1 J(w 1) rJ ~(w 1 ) rJ ~(w 0 ) w 2

Fall 2008 22 Lecture 4 - Sofus A. Macskassy Minimizing by Gradient Descent Search

  • Start with weight vector w 0
  • Compute gradient
  • Compute w 1 = w 0 – 

where  is a “step size” parameter

  • Repeat until convergence ~ J r J~(w 0 ) = Ã @ J~(w 0 ) @w 0 ; @ J~(w 0 ) @w 1 ; : : : ; @ J~(w 0 ) @wn ! rJ ~(w 0 )

Fall 2008 23 Lecture 4 - Sofus A. Macskassy

Computing the Gradient

Let J~i(w) = max(0; ¡yiw ¢ xi)

@ J~(w)

@w

k

=

@

@w

k

0

@

N

N

X

i=1

J^ ~

i(w)

A

=

N

N

X

i=1

Ã

@

@w

k

J^ ~

i

(w)

!

@ J~i(w)

@w

k

=

@

@w

k

max

0

@ 0 ; ¡y

i

X

j

w

j

x

ij

A

=

(

0 if yi

P

j

wjxij > 0

¡yix

ik

otherwise

Fall 2008 24 Lecture 4 - Sofus A. Macskassy

Perceptron: Gradient Descent Search

  • Start with weight vector w 0
  • Compute gradient
  • Compute w 1 = w 0 – 

where  is a “step size” parameter

  • Repeat until convergence r J~(w 0 ) =
Ã

@ J~(w 0 ) @w 0 ; @ J~(w 0 ) @w 1 ; : : : ; @ J~(w 0 ) @wn

!

rJ ~(w 0 ) @ J~i(w) @wk =

(

0 if yi

P

j wjxij >^0 ¡yixik otherwise

Fall 2008 25 Lecture 4 - Sofus A. Macskassy

Batch Perceptron Algorithm

Simplest case:  = 1, don’t normalize g: “Fixed Increment Perceptron”

Given: training examples (xi, yi), i = 1 … N

Let w = (0,0, …, 0) be the initial weight vector

Repeat until convergence

Let g = (0,0, …, 0) be the gradient vector

For i = 1 to N do

ui = w. xi

if (y

i

. u

i

≤ 0)

For j = 1 to n do

g

j

= g

j

– y

i

. x

ij

g = g/N

w = w – g