Download Linear Subspaces and Projection in Computer Vision and more Slides Computer Vision in PDF only on Docsity!
CSE152, Spr 12 Intro Computer Vision
Recognition III
Introduction to Computer Vision
CSE 152
Lecture 19
CSE152, Spr 12 Intro Computer Vision
CSE152, Spr 12 Intro Computer Vision
Linear Subspaces & Linear Projection
- An n -pixel image x ∈ R d^ can be projected to a low-dimensional feature space y ∈ R k^ by
y = W x
where W is an k by d matrix.
- Recognition is performed in R k using, for example, nearest neighbor.
- How do we choose a good W?
Example: Projecting from R^3 to R^2
CSE152, Spr 12 Intro Computer Vision
Distance to Linear Subspace
- An n -pixel image x ∈ R d^ can be projected to a low-dimensional feature space y ∈ R k^ by
y = W x
- From y ∈ R k^ , the reconstruction of the point in R d^ is W T y=W T W x
- The error of the reconstruction, or the distance from x to the subspace spanned by W is: || x - W T Wx||
CSE152, Spr 12 Intro Computer Vision
Distance to Affine Subspace
(i.e., Distance to Face Space)
- Represented by mean vector μ and basis images W
- An n -pixel image x ∈ R d^ can be projected to a low-dimensional feature space y ∈ R k^ by
y = W( x- μ)
- From y ∈ R k^ , the reconstruction of the point in R d^ is W T y+μ= W T W( x- μ)+μ
- The error of the reconstruction, or the distance from x to the affine is: || x - W T W( x- μ)-μ||= ||(I- W T W)( x- μ)||
x (^1)
x (^2)
x 3
x
y
μ
CSE152, Spr 12 Intro Computer Vision
Application 1:
Face detection using “distance to face space”
- Scan a window ω across the image, and classify the window as face/not face as follows:
- Project window to subspace, and reconstruct as described earlier.
- Compute distance between ω and reconstruction.
- Local minima of distance over all image locations less than some threshold are taken as locations of faces.
- Repeat at different scales.
- Possibly normalize windows intensity so that |ω| = 1.
CSE152, Spr 12 Intro Computer Vision
An important footnote:
We don’t really implement PCA by constructing a
covariance matrix!
Why?
1. How big is Σ?
• n by n where n is the number of pixels in an
image!!
2. You only need the first k Eigenvectors
CSE152, Spr 12 Intro Computer Vision
Singular Value Decomposition
- Any m by n matrix A may be factored such that A = U Σ V T [m x n] = [m x m][m x n][n x n]
- U : m by m , orthogonal matrix
- Columns of U are the eigenvectors of AA T
- V : n by n , orthogonal matrix,
- columns are the eigenvectors of A T A
- Σ : m by n , diagonal with non-negative entries (σ 1 , σ 2 , …, σs) with s=min(m,n) are called the called the singular values. SVD algorithm produces sorted singular values : σ 1 ≥ σ 2 ≥ … ≥ σs Important property
- Singular values are the square roots of Eigenvalues of both AAT and A TA & Columns of U are corresponding Eigenvectors!!
CSE152, Spr 12 Intro Computer Vision
Performing PCA with SVD
• Singular values of A are the square roots of eigenvalues
of both AAT^ and ATA & Columns of U are
corresponding Eigenvectors
• And
• Covariance matrix is:
• So, ignoring 1/n subtract mean image μ from each input
image, create a d x n data matrix, and perform thin SVD
on the data matrix. D=[x 1 -μ | x 2 -μ | … xn-μ ]
aiai^ T i = 1
n
∑ =^ [ a 1 a 2 ^ an ] [ a 1 a 2 ^ an ]
T = AAT
n
x i −
i = 1
n
∑ (^
x i −
μ) T
CSE152, Spr 12 Intro Computer Vision
PCA & Fisher’s Linear Discriminant
• Between-class scatter
• Within-class scatter
• Total scatter
• Where
- c is the number of classes
- μi is the mean of class χi
- | χi | is number of samples of χi..
χ 1 χ^2
If the data points xi are projected by yi=Wxi and the scatter of xi is S, then the scatter of the projected points yi is WTSW
CSE152, Spr 12 Intro Computer Vision
PCA & Fisher’s Linear Discriminant
• PCA (Eigenfaces)
Maximizes projected total scatter
• Fisher’s Linear Discriminant
Maximizes ratio of projected between-class to projected within-class scatter
χ 1 χ^2
PCA
FLD
CSE152, Spr 12 Intro Computer Vision
Computing the Fisher Projection Matrix
• The wi are orthonormal
• There are at most c -1 non-zero generalized
Eigenvalues, so m ≤ c-
• Can be computed with eig in Matlab
CSE152, Spr 12 Intro Computer Vision
Recognition
CSE152, Spr 12 Intro Computer Vision
Recognition in Cluttered Scenes
Interest Points + Feature
Descriptors + Relations
CSE152, Spr 12 Intro Computer Vision
Example
Training examples
Test image
CSE152, Spr 12 Intro Computer Vision
Matching using Local Image features
Simple approach
• Detect corners in image (e.g. Harris corner
detector).
• Represent neighborhood of corner by a feature
vector produced by Gabor Filters, K-jets, SIFT
features, etc.
• Modeling: Given an training image of an object
w/o clutter, detect corners, compute feature
descriptors, store these.
• Recognition time: Given test image with possible
clutter, detect corners and compute features. Find
models with same feature descriptors (hashing)
and vote.
CSE152, Spr 12 Intro Computer Vision
Figure from “Local grayvalue invariants for image retrieval,” by C. Schmid and R. Mohr, IEEE Trans. Pattern Analysis and Machine Intelligence, 1997 copyright 1997, IEEE CSE152, Spr 12 Intro Computer Vision
Employ spatial relations
Figure from “Local grayvalue invariants for image retrieval,” by C. Schmid and R. Mohr, IEEE Trans. Pattern Analysis and Machine Intelligence, 1997 copyright 1997, IEEE
CSE152, Spr 12 Intro Computer Vision
Figure from “Local grayvalue invariants for image retrieval,” by C. Schmid and R. Mohr, IEEE Trans. Pattern Analysis and Machine Intelligence, 1997 copyright 1997, IEEE
CSE152, Spr 12 Intro Computer Vision
Even without shading, shape reveals a lot
CSE152, Spr 12 Intro Computer Vision
Motion
Introduction to Computer Vision
CSE 152
Lecture 19-b
CSE152, Spr 12 Intro Computer Vision
Motion
Some problems of motion
- Correspondence: Where have elements of the image moved between image frames
- Reconstruction: Given correspondence, what is 3-D geometry of scene
- Ego Motion: How has the camera moved.
- Segmentation: What are regions of image corresponding to different moving objects
- Tracking: Where have objects moved in the image? related to correspondence and segmentation.
Variations:
- Small motion (video),
- Wide-baseline (multi-view)
CSE152, Spr 12 Intro Computer Vision
Structure-from-Motion (SFM)
Goal: Take as input two or more images or
video w/o any information on camera
position/motion, and estimate camera
position and 3-D structure of scene.
Two Approaches
1. Discrete motion (wide baseline)
1. Orthographic (affine) vs. Perspective
2. Two view vs. Multi-view
3. Calibrated vs. Uncalibrated
2. Continuous (Infinitesimal) motion
CSE152, Spr 12 Intro Computer Vision
Discrete Motion: Some Counting
Consider M images of N points, how many unknowns?
- Camera locations: Affix coordinate system to location of first camera location: (M-1)*6 Unknowns
- 3-D Structure: 3*N Unknowns
- Can only recover structure and motion up to scale. Why?
Total number of unknowns: (M-1)6+3N-
Total number of measurements: 2MN
Solution is possible when (M-1)6+3N-1 ≤ 2MN
M=2 N≥ 5
M=3 N ≥ 4