Notes for Statistical Methods in Recognition | CMSC 828D, Study notes of Computer Science

Material Type: Notes; Professor: Chellappa; Class: ADV TOPC INFO PROC; Subject: Computer Science; University: University of Maryland; Term: Unknown 1989;

Typology: Study notes

Pre 2010

Uploaded on 02/13/2009

koofers-user-qgn
koofers-user-qgn šŸ‡ŗšŸ‡ø

10 documents

1 / 41

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Statistical methods in recognition
♦Basic steps in classifier design
–Collect training data
–Choose a classification model
•Statistical
•Linguistic
–Estimate ā€œparametersā€ of classification model from
training images
•Learning
–Evaluate model on training data and refine
–Collect test data set
–Apply classifier to test data
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29

Partial preview of the text

Download Notes for Statistical Methods in Recognition | CMSC 828D and more Study notes Computer Science in PDF only on Docsity!

Statistical methods in recognition

♦ Basic steps in classifier design

  • Collect training data
  • Choose a classification model
    • Statistical
    • Linguistic
  • Estimate ā€œparametersā€ of classification model from training images - Learning
  • Evaluate model on training data and refine
  • Collect test data set
  • Apply classifier to test data

Why is classification a problem?

♦ Because classes overlap in our (impoverished)

representations

♦ Example: Classify a person as a male or female

based on weight

  • Male training set :{ 155, 122, 135, 160, 240, 220, 180, 145}
  • Female training set: {95, 132, 115, 124, 145, 110, 150}
  • Unknown sample has weight 125. Male or female?

Basic approaches to statistical

classification

1. Build (parametric) probabilistic models of our

training data, and compute the probability that an

unknown sample belongs to each of our possible

classes using these models.

2. Compare an unknown sample directly to each

member of the training set, looking for the training

element ā€œmost similarā€ to the unknown.

Nearest neighbor classification

3. Train a neural network to recognize unknown

samples by ā€œteaching itā€ how to correctly train the

elements of the training set.

A primer on probability

♦ Probability spaces - models of random phenomena

♦ Example: a box contains s balls labeled 1, ..., s

  • Experiment: Pick a ball, note its label and then replace it in the box. Repeat this experiment n times.
  • Let Nn(k) be the number of times that a ball labeled k was chosen in an experiment of length n
  • example: s = 3, n = 20 1 1 3 2 1 2 2 3 2 3 3 2 1 2 3 3 1 3 2 2
  • N 20 (1) = 5 N 20 (2) = 8 N 20 (3) = 7

Primer on probability

♦ Suppose: we color balls 1, ..., r red and balls r+1,

.., s green

  • What is the probability of choosing a red ball?
  • Intuitively it is r/s = Ī£ pk where the sum is over all ωk such that the k’th ball is red

♦ Let A be the subset of possible outcomes, ωk ,

such that k is red.

  • A has r points
  • A is called an event
  • When we say that A has occurred we mean that an experiment has been run and the outcome is represented by a point in A.

♦ If A and B are events, then so are A ∩ B, A ∪ B

and Ac

Primer on probability

♦ Assigning probabilities to events:

♦ A probability measure on a set Ī© of possible

outcomes is a real valued function having domain

2 Ī©^ satisfying

– P(Ī©) = 1
  • 0 <= P(A) <= 1, for all
  • If An are mutually disjoint sets then

P ( B ) = pk ωk ∈ B

k

n

n

k

n

P An P A

(U ) ( )

A āŠ‚ Ī©

Primer on probability

♦ Let A and B be two events such that P(A) > 0.

Then the conditional probability that B occurs

given A, written P(B|A) is defined to be

♦ Ball example: what is P(ā€œ1ā€| ā€œredā€)

  • Let r = 5 and b = 15
  • P(1 and red) =.
  • P(red) =.
  • So, P(1 | red) = .05/.25 =.

P ( B | A ) =

P ( B ∩ A ) P ( A )

Primer on probability

♦ Recognition

  • A 1 , ..., An are mutually disjoint events with union Ī©.
    • think of the Ai as the possible identities of an object
  • B is an event with P(B) > 0
    • think of B as an observable event, like the area of a component in an image
  • P(B|Ak) and P(Ak) are known, k = 1,..., n
    • P(B|Ak) is the probability that we would observe a component with area B if the identify of the object is Ai
    • P(Ak) is the prior probability that an event is in class k.
  • Question: What is P(Ai|B)
    • What we will really be after - the probability that the identity of the object is Ai given that we make measurements B

Training - computing P(B|Ai)

♦ Our training data is used to compute the

P(B|Ai), where B is the vector of features we plan

to use to classify unknown images in the classes

Ai

  • B might be (area, perimeter, moments)

♦ How might we represent P(B|Ai)?

  • as a table
    • quantize area, perimeter and average gray level suitably, and then use the training samples to fill in the three dimensional histogram.
    • analytically, by a standard probability density function such as the normal, uniform, ...

A

P

G

Primer on probability - training

♦ When we have many random variables it is

usually impractical to create a table of the values

of P(B|Ai)from our training set.

  • Example
    • 5 measurements
    • quantize each to 50 possible values
    • Then there are 50^5 possible 5-tuples we might observe in any element of the training set, and we would need to estimate this many probabilities to represent the conditional probability - too few training samples - too much storage required for the table

Primer on probability

♦ Density function is

called the Gaussian

function and the error

function

  • μ is called the location parameter
  • σ is called the scale parameter

♦ Generalization to

multivariate density

functions

  • mean vector
  • covariance matrix

Prior probabilities and their role

in classification

♦Prior probabilities of each object class

  • probabilities of the events: object is from class i (P(Ai))
  • Example
    • two classes - A and B; two measurement outcomes: 0 and 1
    • prob(0|A) = .5, prob(1|A) = .5; prob(0|B) =. prob(1|B)=.
  • Might guess that if we measure 0 we should decide that the class is A, but if we measure 1 we should decide B

Prior probabilities

♦ So, how do we balance the effects of the prior

probabilities and the class conditional

probabilities?

♦ We want a rule that will make the fewest errors

  • Errors in A proportional to P(A)P(x|A)
  • Errors in B proportional to P(B)P(x|B)
  • To minimize the number of errors choose A if P(A)P(x|A) > P(B)P(x|B); choose B otherwise

♦ The rule generalizes to many classes. Choose the

Ci such that P(Ci)P(x|Ci) is greatest.

♦ Of course, this is just Bayes’ rule again

Bayes error

♦ The formula for P(Ci|x) is

♦ where

is a normalization factor that is the same for all

classes.

♦ To evaluate the performance of our decision rule

we can calculate the probability of error -

probability that the sample is assigned to the

wrong class.

P ( Ci | x )= P ( Ci ) P ( x^ | Ci^ ) P ( x )

P ( x ) = P ( Ci ) P ( x | Ci ) i

āˆ‘