Download Fall 2008 Lecture 4: Machine Learning - Logistic Regression & Perceptron Algorithm by S. A and more Assignments Computer Science in PDF only on Docsity!
Fall 2008 1 Lecture 4 - Sofus A. Macskassy
Machine Learning (CS 567) Lecture 4
Fall 2008
Time: T-Th 5:00pm - 6:20pm
Location: GFS 118
Office: SAL 216
Office hours: by appointment
Office: SAL 229
Office hours: M 2-3, W 11-
Class web page:
http://www-scf.usc.edu/~csci567/index.html
Fall 2008 2 Lecture 4 - Sofus A. Macskassy
Administrative: Home work
• Due at start of class on due date.
• 25% off if after class but before 8am next morning
• 50% off if received next day after 8am
• 100% off if not received next day
• If you have a valid excuse, let me know ASAP with proper
documentation. No exceptions.
• Grading compliant policy: if students have problems at
assignment grading, feel free to talk to the instructor/TA for
it. However, even if students only request re-grading one of
the answers, all this assignment will be checked and graded
again. (take the risk of possible losing total points.)
Fall 2008 3 Lecture 4 - Sofus A. Macskassy
Administrative: Final Project
• Groups of 2-4 people
• Deliverables:
- Research paper
- Write paper as though submitting to a conference or workshop.
- Presentation at the last two classes
• Work should cover:
- Good evaluation technique, compare at least 2 classifiers, findings
should be clearly stated and validated, related work should be
cited.
• Project pre-proposals due on September 23
- You can interact with me before hand
- 1-2 paragraphs outlining the problem and what you will do
• Project proposals will due on October 9
- 1-2 pages detailing what you will do, who are in the group, the
data, the machine learning methods you will look at, etc
Fall 2008 4 Lecture 4 - Sofus A. Macskassy
Project Idea: Applied Learning
Take an interesting dataset Compare several learning approaches for prediction
- decision trees
- ANNs
- instance-based methods
- SVMs
- boosting
Fall 2008 5 Lecture 4 - Sofus A. Macskassy
Project Idea: Improvements to
Learning Methods
There are many suggestions on how to improve various learning methods, both in books and in papers. Identify some suggestions and test them empirically.
Fall 2008 6 Lecture 4 - Sofus A. Macskassy
Project Idea: Comparison of Learning
Methods
Identify some learners. Run thorough comparison tests and determine the reasons for their different performances.
Fall 2008 7 Lecture 4 - Sofus A. Macskassy
Text Classification
Easy to get lots of text: web, TREC data,
email (e.g., enron)
Predict topic, authorship, sentiment, style,
affect, attitude.
Fall 2008 8 Lecture 4 - Sofus A. Macskassy
Reinforcement Learning
Can be challenging to do well: generate data or
direct control
• Optimize gas well production
• Object tracking
• “Tag” grid world
• Rover control
• animal behavior experiments
Compare approaches (direct policy search, value
function learning, model-based, …)
Fall 2008 9 Lecture 4 - Sofus A. Macskassy
Active Learning
Take a classic dataset.
Explore the tradeoff between size of training
set and generalization.
Devise schemes for choosing items to “pay”
for labels to maximize accuracy with
minimum cost.
Fall 2008 10 Lecture 4 - Sofus A. Macskassy
Project Methodology
Lots of good ideas for algorithms and domains.
The hard question is: “How will you evaluate it?”
Ultimately, you need to present more than one
algorithm (and perhaps more than one problem)
and you’ll need some way of saying what worked
better.
What’s the gold standard?
Fall 2008 11 CS 567 Lecture 3 - Sofus A. Macskassy
Lecture 4 Outline
• Linear Threshold Units
– Perceptron
– Logistic Regression
Fall 2008 12 CS 567 Lecture 3 - Sofus A. Macskassy
The Unthresholded Discriminant
Function is a Hyperplane
- The equation g( x ) = w ¢ x is a plane
y^ =
( +1 if g(x) ¸ 0
¡ 1 otherwise
Fall 2008 13 CS 567 Lecture 3 - Sofus A. Macskassy
Alternatively…
0 1 2 10 20 1 10 2 20 1 2
w
w w
w w
g g g
T T T T
w x
w w x
w x w x
x x x
otherwise
if 0
choose
2 1
C
C g x
Fall 2008 14 CS 567 Lecture 3 - Sofus A. Macskassy
Geometry
Fall 2008 15 CS 567 Lecture 3 - Sofus A. Macskassy
Multiple Classes
0 0
|
i
T
i i i i
g x w ,w w x w
Classes are
linearly separable
x x
j
K
j
i
i
g g
C
max
Choose if
Fall 2008 16 CS 567 Lecture 3 - Sofus A. Macskassy
Pairwise Separation
0 0
|
ij
T
ij ij ij ij
g x w ,w w x w
don' t care otherwise
0 if
0 if
j i ij
C
C
g x
x
x
0
choose if
x
ij i
j i, g
C
Fall 2008 17 CS 567 Lecture 3 - Sofus A. Macskassy
• When learning a classifier, the natural way to
formulate the learning problem is the following:
– Given:
- A set of N training examples {( x 1 ,y 1 ), ( x 2 ,y 2 ), …, ( x N,yN)}
- A loss function L
– Find:
- The weight vector w that minimizes the expected loss on the
training data
• In general, machine learning algorithms apply some
optimization algorithm to find a good hypothesis. In
this case, J is piecewise constant, which makes this
a difficult problem
J(w) =
N
X^ N i=
L(sgn(w ¢ xi); yi):
Machine Learning and Optimization
Fall 2008 18 CS 567 Lecture 3 - Sofus A. Macskassy
Approximating the expected loss by a
smooth function
- Simplify the optimization problem by replacing the original objective function by a smooth, differentiable function. For example, consider the hinge loss: When y = 1
J~(w) =
1 N X^ N
i=
max(0; 1 ¡ yiw ¢ xi)
Fall 2008 19 Lecture 4 - Sofus A. Macskassy
Optimizing… what?
Hinge loss 0-1 loss
Fall 2008 20 CS 567 Lecture 3 - Sofus A. Macskassy
Optimizing… what?
w 0
Fall 2008 21 Lecture 4 - Sofus A. Macskassy
Optimizing… what?
w 0 J(w 0) w 1 J(w 1) rJ ~(w 1 ) rJ ~(w 0 ) w 2
Fall 2008 22 Lecture 4 - Sofus A. Macskassy Minimizing by Gradient Descent Search
- Start with weight vector w 0
- Compute gradient
- Compute w 1 = w 0 –
where is a “step size” parameter
- Repeat until convergence ~ J r J~(w 0 ) = Ã @ J~(w 0 ) @w 0 ; @ J~(w 0 ) @w 1 ; : : : ; @ J~(w 0 ) @wn ! rJ ~(w 0 )
Fall 2008 23 Lecture 4 - Sofus A. Macskassy
Computing the Gradient
Let J~i(w) = max(0; ¡yiw ¢ xi)
@ J~(w)
@w
k
=
@
@w
k
0
@
N
N
X
i=1
J^ ~
i(w)
A
=
N
N
X
i=1
Ã
@
@w
k
J^ ~
i
(w)
!
@ J~i(w)
@w
k
=
@
@w
k
max
0
@ 0 ; ¡y
i
X
j
w
j
x
ij
A
=
(
0 if yi
P
j
wjxij > 0
¡yix
ik
otherwise
Fall 2008 24 Lecture 4 - Sofus A. Macskassy
Perceptron: Gradient Descent Search
- Start with weight vector w 0
- Compute gradient
- Compute w 1 = w 0 –
where is a “step size” parameter
- Repeat until convergence r J~(w 0 ) =
Ã
@ J~(w 0 ) @w 0 ; @ J~(w 0 ) @w 1 ; : : : ; @ J~(w 0 ) @wn
!
rJ ~(w 0 ) @ J~i(w) @wk =
(
0 if yi
P
j wjxij >^0 ¡yixik otherwise
Fall 2008 25 Lecture 4 - Sofus A. Macskassy
Batch Perceptron Algorithm
Simplest case: = 1, don’t normalize g: “Fixed Increment Perceptron”
Given: training examples (xi, yi), i = 1 … N
Let w = (0,0, …, 0) be the initial weight vector
Repeat until convergence
Let g = (0,0, …, 0) be the gradient vector
For i = 1 to N do
ui = w. xi
if (y
i
. u
i
≤ 0)
For j = 1 to n do
g
j
= g
j
– y
i
. x
ij
g = g/N
w = w – g