Principal Angles & Multivariate Gaussian: Subspace Relationships & Probabilistic Modeling, Slides of Advanced Computer Architecture

The concepts of principal angles, which define the relationship between two subspaces in multidimensional space. It also introduces the multivariate gaussian distribution, a probabilistic model used to describe the distribution of multivariate data. How to find the mean and covariance matrix of a gaussian distribution based on maximum likelihood estimation.

Typology: Slides

2012/2013

Uploaded on 04/27/2013

ashalata
ashalata 🇮🇳

3.8

(18)

106 documents

1 / 17

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Lecture 8
1 / 18
Docsity.com
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff

Partial preview of the text

Download Principal Angles & Multivariate Gaussian: Subspace Relationships & Probabilistic Modeling and more Slides Advanced Computer Architecture in PDF only on Docsity!

  • Lecture

Overview

Multivariate Gaussian Mahalanobis distance Probabilistic PCA Factor analysis

Principal angles (cont’d)

Related to the notion of distance between eqidimensional subspaces If p = q, then dist(F , G ) =

1 − cos(θp )^2 If the columns of QF ∈ IRm×p^ and QG ∈ IRm×q^ define orthonormal bases for F and G respectively (from QR decomposition), then

max u∈F ‖u‖ 2 =

max v∈G ‖v‖ 2 =

u>v = max y∈IRp ‖y‖ 2 =

max z∈IRq ‖z‖ 2 =

y>(Q F> QG )z

If

A =

 (^) and B =

then the cosines of the principal angles between ran(A) and ran(B) are 1.000 and 0.

Multivariate Gaussian distribution

Assume X = {x 1 ,... , xn} can be modeled with Gaussian distribution

p(x|μ, C) = | 2 π|−^

(^12) exp{−

(x − μ)>C−^1 (x − μ)}

where μ is the mean and C is the covariance matrix Assume independent observations, find μ and C that maximize log likelihood p(X |μ, C) =

∏n i=1 p(xi^ |μ,^ C) L = log

∏n i=1 p(xi^ |μ,^ C) =^ −^

n 2 log^ |^2 πC| −^

1 2

i (xi^ −^ μ)

C− (^1) (xi − μ)

Maximum likelihood estimate: ∂L ∂μ = 0^ ⇒^ μˆ^ =^

1 n

i xi^ (sample mean) ∂L ∂C = 0^ ⇒^ Cˆ^ =^

1 n

i (xi^ −^ μˆ)(xi^ −^ μˆ)

(^) (sample covariance)

Geometric interpretation

The equidensity contours of a non-singular Gaussian are ellipsoids (i.e., linear transformation of hyperspheres) The directions of the principal axes of the ellipsoids are the eigenvectors of covariance matrix C, and the lengths are the corresponding singular values

Let C = UΣU>^ = (UΣ^1 /^2 )(UΣ^1 /^2 )>^ (i.e., eigendecomposition) where the columns of U are orthonormal basis and Σ is a diagonal matrix

X ∼ N(μ, C) ⇐⇒ X ∼ μ + UΣ^1 /^2 N(0, I ) ⇐⇒ X ∼ μ + UN(0, Σ)

The distribution of N(μ, C) is equivalent to N(0, I ) scaled by Σ^1 /^2 , rotated by U and translated by μ

Mahalanobis distance

The quantity

d M^2 = (x − μ)>C−^1 (x − μ) = (C −^1 /^2 (x − μ))>(C −^1 /^2 (x − μ))

is called the Mahalanobis distance from x to μ Also known as generalized squared inter-point distance The distance of a point x to the center of mass divided by the width of the ellipsoid in the direction of x Linear transformation of the coordinate system Keep its quadratic form and remain non-negative If C = I , Mahalanobis distance reduces to Euclidean distance If C is diagonal, the resulting distance is normalized Euclidean distance d(x, y) =

∑m i=

(xi −yi )^2 σ i^2 where^ σi^ is the standard deviation of^ xi Can be approximated with eigenvectors of C Used for learning distance metric

Factor analysis

A generative dimensionality reduction algorithm Let x ∈ IRm^ and z ∈ IRd^ , x is modeled by z, dubbed as factors (d < m) x = Λz + ε

I (^) Λ is factor loading matrix I (^) z is assumed be N(0, I ) distributed (zero mean, unit variance normals) I (^) The factors z model correlation between the elements of x I (^) ε is a random variable to account for noise and assumed to be distributed with N(0, Ψ) where Ψ is a diagonal matrix (whereas PCA uses an isotropic error model with ψi = σ^2 ) I (^) ε accounts for independent noise in each element of x I (^) The diagonality of Ψ is a key assumption: constraining the error covariance Ψ for estimation I (^) The observed variable, xi , are conditionally independent given the factors z I (^) x is N(0, ΛΛ>^ + Ψ) distributed (whereas PCA models with N(0, ΛΛ>^ + σ^2 I )

Properties of factor analysis

Factor analysis: x = Λz + ε Latent variables z: explain correlations between x εi represents variability unique to a particular xi Differ from PCA which treats covariance and variance identically Want to infer Λ and Ψ from x Suppose Λ and Ψ are known, by linear projection E [z|x] = βx where β = Λ>(Ψ + ΛΛ>)−^1 , since the joint Gaussian of data x and factors z: p(

[

x z

]

) = N(

[

]

[

ΛΛ>^ + Ψ Λ

Λ>^ I

]

EM algorithm for factor analysis

Expectation-Maximization: useful technique for dealing with missing data Start with some initial guess of missing data and evaluate the expected values Optimize the missing parameters by taking derivate of likelihood of observed and missing data w.r.t. parameters Repeat until the data likelihood does not change E-step: Given Λ and Ψ, for each data point xi , compute E [z|x] = βx E [zz>|x] = Var (z|x) + E [z|x]E [z|x]> = I − βΛ + βxx>β> M-step: Λnew^ = (

∑n i=1 xi^ E^ [z|xi^ ]

)(∑n i=1 E^ [zz |xi ])− 1 Ψnew^ = (^1) n diag{

∑n i=1 xi^ x

i −^ Λ new (^) E [z|xi ]x> i } where diag operator sets all off-diagonal elements to zero

FA and PCA

Factor analysis provides a proper probabilistic model PCA is rotationally invariant; FA is not Given a set of data points, would Λ correspond to orthonormal basis of a PCA subspace? No, in most cases However, Λ corresponds to orthonormal basis if FA has isotropic error model, i.e., ψi = σ^2

Probabilistic principal component analysis (cont’d)

Maximize log likelihood with the EM algorithm,

Λ = U(Σ − σ^2 I )^1 /^2 R

I (^) Um×d is the first d eigenvectors computed from covariance matrix S I (^) Σd×d is a diagonal matrix corresponding to the first d eigenvalues, λi I (^) Rd×d is an arbitrary orthogonal rotation matrix (note z has a uniform Gaussian distribution) I (^) The noise variance σ^2 is the residual variance per dimension

σ^2 = 1 m − d

∑^ m i=d+

λi

Big picture

from “A unifying review of linear Gaussian models” by Zoubin Ghahramani and Sam Roweis