Download Templates and Classifiers: Image Recognition using Templates and Classifiers - Prof. Jane and more Study notes Computer Science in PDF only on Docsity!
Templates and Classifiers
Last Day: Structure from Motion
Feature-based
Dense
Today: Templates and Classifiers (F&P
Ch 22, Hebert notes CMU)
Recognition by template matching
Recognition by finding patterns
We have seen very simple template matching
(under filters)
Some objects behave like quite simple
templates
Frontal faces
Strategy:
Find image windows
Correct lighting
Pass them to a statistical test (a classifier) that
accepts faces and rejects non-faces
Templates for Recognition
Some objects can be identified by simple
tests on image windows (faces, stop signs)
Template matching :
Take all windows of a particular shape
Test to see whether relevant object is present.
Possibly search over scale (size) and orientation
More complicated shapes and objects can be
identified by looking at relationships among
groups of templates.
Classifiers
How do we test whether object is
present?
Classifier:
takes a feature set as input
Produces a class label
Build using a training set of feature-
label examples (xi,yi)
Find a rule that takes a plausible
measurement xi and computes its label yi
Basic ideas in classifiers
Loss
some errors may be more expensive than others
e.g. a fatal disease that is easily cured by a cheap medicine with no side-effects -> false positives in diagnosis are better than false negatives
We discuss two class classification: L(1->2) is the
loss caused by calling 1 a 2
Total risk of using classifier s
We want to minimize total risk R
R ( s )= Pr{ 1 → 2 | s } L ( 1 → 2 )+Pr{ 2 → 1 | s } L ( 2 → 1 )
Basic ideas in classifiers
Generally, we should classify as 1 if the
expected loss of classifying as 1 is better
than for 2
For observation x gives
Crucial notion: Decision boundary
points where the loss is the same for either case
1 if
2 if
Probabilistic Formulation
Decision boundary:
Learn p(feature|object):
Bayes Risk: E(R({1,2},s)) over label set {1,2}
p(object1|feature) p(object2|feature)
For λ=1: Bayes risk
λ ( | )
2
1 pobject feature
pobject feature
p ( objectj | feature )~ p ( feature | objectj ) p ( objectj )
P
feature
Issues
How to represent and learn
p(feature|objectj ) or decision
boundary?
How to approach Bayes risk given small
number of samples?
What features to use?
How to reduce the feature space?
Evaluating Classifier Performance
Detection Rate = Prob(feature from object is
correctly classified as object)
False Positive Rate = Prob(feature from
background is classified as object)
Operating point
pbackground feature
pobjectfeature
Detection Rate
False positive rate
Receiver Operating Characteristic (ROC )
λ decreasing
ROC Curve
4 cases True positive ( sensitivity ), false positive, true negative (s pecificity) , false negative ROC tells us what happens as we vary test threshold
Approaches
Every single pattern classification/learning
approach has been applied to this problem
Pick your favorite:
Naïve Bayes
Boosting
Neural networks
SVMs
NNs
PCA/LDA/ICA dimensionality reduction
etc
Tests
[Example from Bernt Schiele]
More Complicated Features
C(x,y,s) = Wavelet coefficient at positionx,y at scales
Feature = Set of coefficientsS = (C1,..,CN)
[Example from Henry Schneiderman]
Given features S 1 ,.., S r computed from a
window, threshold the likelihood ratio
= (... | )
(... | ) log 1 2
1 1 ω
ω
r
r ps s
ps s
... log
log
log
2
1 2 2
2 1 1 2
(^11) λ ω
ω ω
ω ω
ω
r
r
ps
ps
ps
ps
ps
ps
Assume independence (Naïve Bayes)
How can we compute these probabilities?
Estimating the Probabilities
Collect the values of the features for
training data in histograms that
approximate the probabilities
50-2,000 original images ~1,000 synthetic variations per original image
~10,000,000 examples
Example from Henry Schneiderman
Compute the values of all the features in the window
For each feature, compute the probabilities of coming
from the object or non-object class.
Aggregate into likelihood ratio
Example from Henry Schneiderman
From Windows to Images
Move a window to all possible positions and all
possible scales
At each (position,scale) evaluate the
classifier
Return detection if above threshold
Search in position
Search in scale
Example from Henry Schneiderman
Feature Selection
Each feature is a set of variables (wavelet
coefficients)S = {C1,..,CN}
Find feature set which best classifies the dataset
Problem:
IfN is large, the feature is very discriminative (S is equivalent to the entire window ifN is the total
number of variables) but representing the
corresponding distribution is very expensive
IfN is small, the feature is not discriminative but
classification is very fast
Solution: Classifier Cascade
Standard problem:
We can have either discriminative or efficient
features but not both!
Cannot do classification in one shot
PCA – Captures most of the variance (later)
Classifier Cascade
Apply first a classifier with simple features Fast
and will eliminate the most obvious non-object
locations
Then apply a classifier with more complex
features. More expensive but appliedonly to these
locations that survived the previous stage
Cascade Example
Cascade Stage 1 Cascade Stage 2^ Cascade Stage 3
Apply classifier with very simple (and fast) features Eliminates most of the image
Apply classifier with more complex features on what is left
Apply classifier with more complex features on what is left
Using Weak Features
Don’t try to design strong features from the
beginning, just use really stupid but really fast
features (and a lot of them)
Weak learner = Very fast (but very inaccurate)
classifier
Example: Multiply input window by a very simple box
operator and threshold output
(Example from Paul Viola, Distributed by Intel as part of the OpenCV library)
Feature Selection
Operators defined over all possible shapes
and positions within the window
For a 24x24 window 45,396 combinations!!
How to select the “useful” features?
How to combine them into classifiers?
Input: Training examples {xi} with labels
(“face” or “non-face” = +/-1) {yi} + weightswi
(initiallywi = 1)
Choose the feature (weak classifierht) with
minimum error:
Update the weights such that
wi is increased ifxi is misclassified wi is decreased ifxi is correctly classified
Compute a weight αt for classifierht
αt large if εt is small
Final classifier:
= ∑ [ ≠ ]
i
ε t wi ht ( xi ) yi
⎟ ⎠
⎞ ⎜ ⎝
⎛
t
H ( x ) sgn α t ht ( x )
Repeat T times
PCA for Recognition
Assume centred features
Training features: X 1,.., X m
Compute principal directions: V 1,.., V k Project training features onto principal directions
Input: Feature vector X
Project X onto principal component space:
Find object with feature vector X’ io
closest to X :
Xi ≈λ 1 V 1 ⋅ Xi +...+λ kVk ⋅ X i
'
Xi ≈λ 1 V 1 ⋅ X +...+λ kVk ⋅ X
'
i
i X Xi
' '
0 =arg^ max −
Appearance-based matching
Nayar et al.’96 Columbia
Difficulties with PCA
Projection may suppress important
detail
smallest variance directions may not be
unimportant
Method does not take discriminative
task into account
typically, we wish to compute features that
allow good discrimination
not the same as largest variance
Linear Discriminant Analysis
We wish to choose linear functions of
the features that allow good
discrimination.
Assume class-conditional covariances are
the same
Want linear feature that maximises the
spread of class means for a fixed within-
class variance
Problems
Variation in appearance due to illumination and expression due to identity Assumes “linear” distribution of features Best choice for compression may not be the best choice for discrimination LDA Find projection direction V that separates the 2 classes best Minimize:
Generalized eigenvalue problem Similar application of LDA to faces: FisherFaces (Belhumeur, Yale/Columbia)
VC CV^ T
V X X ( )
( ( )) 1 2
2 1 2
= ⋅ − Scatterofclassesafter projection
Distancebetweenclassesafterproj
Example
Example from Belhumeur et al.