Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Principal Angles & Multivariate Gaussian: Subspace Relationships & Probabilistic Modeling, Slides of Advanced Computer Architecture

Aligarh Muslim University Advanced Computer Architecture

The concepts of principal angles, which define the relationship between two subspaces in multidimensional space. It also introduces the multivariate gaussian distribution, a probabilistic model used to describe the distribution of multivariate data. How to find the mean and covariance matrix of a gaussian distribution based on maximum likelihood estimation.

Typology: Slides

2012/2013

Uploaded on 04/27/2013

ashalata 🇮🇳

3.8

(18)

106 documents

1 / 17

This page cannot be seen from the preview

Don't miss anything!

Lecture 8

1 / 18

Docsity.com

Discover Slides of Advanced Computer Architecture Aligarh Muslim University

Partial preview of the text

Download Principal Angles & Multivariate Gaussian: Subspace Relationships & Probabilistic Modeling and more Slides Advanced Computer Architecture in PDF only on Docsity!

Lecture

Overview

Multivariate Gaussian Mahalanobis distance Probabilistic PCA Factor analysis

Principal angles (cont’d)

Related to the notion of distance between eqidimensional subspaces If p = q, then dist(F , G ) =

1 − cos(θp )^2 If the columns of QF ∈ IRm×p^ and QG ∈ IRm×q^ define orthonormal bases for F and G respectively (from QR decomposition), then

max u∈F ‖u‖ 2 =

max v∈G ‖v‖ 2 =

u>v = max y∈IRp ‖y‖ 2 =

max z∈IRq ‖z‖ 2 =

y>(Q F> QG )z

A =

 (^) and B =

then the cosines of the principal angles between ran(A) and ran(B) are 1.000 and 0.

Multivariate Gaussian distribution

Assume X = {x 1 ,... , xn} can be modeled with Gaussian distribution

p(x|μ, C) = | 2 π|−^

(^12) exp{−

(x − μ)>C−^1 (x − μ)}

where μ is the mean and C is the covariance matrix Assume independent observations, find μ and C that maximize log likelihood p(X |μ, C) =

∏n i=1 p(xi^ |μ,^ C) L = log

∏n i=1 p(xi^ |μ,^ C) =^ −^

n 2 log^ |^2 πC| −^

1 2

i (xi^ −^ μ)

C− (^1) (xi − μ)

Maximum likelihood estimate: ∂L ∂μ = 0^ ⇒^ μˆ^ =^

1 n

i xi^ (sample mean) ∂L ∂C = 0^ ⇒^ Cˆ^ =^

1 n

i (xi^ −^ μˆ)(xi^ −^ μˆ)

(^) (sample covariance)

Geometric interpretation

The equidensity contours of a non-singular Gaussian are ellipsoids (i.e., linear transformation of hyperspheres) The directions of the principal axes of the ellipsoids are the eigenvectors of covariance matrix C, and the lengths are the corresponding singular values

Let C = UΣU>^ = (UΣ^1 /^2 )(UΣ^1 /^2 )>^ (i.e., eigendecomposition) where the columns of U are orthonormal basis and Σ is a diagonal matrix

X ∼ N(μ, C) ⇐⇒ X ∼ μ + UΣ^1 /^2 N(0, I ) ⇐⇒ X ∼ μ + UN(0, Σ)

The distribution of N(μ, C) is equivalent to N(0, I ) scaled by Σ^1 /^2 , rotated by U and translated by μ

Mahalanobis distance

The quantity

d M^2 = (x − μ)>C−^1 (x − μ) = (C −^1 /^2 (x − μ))>(C −^1 /^2 (x − μ))

is called the Mahalanobis distance from x to μ Also known as generalized squared inter-point distance The distance of a point x to the center of mass divided by the width of the ellipsoid in the direction of x Linear transformation of the coordinate system Keep its quadratic form and remain non-negative If C = I , Mahalanobis distance reduces to Euclidean distance If C is diagonal, the resulting distance is normalized Euclidean distance d(x, y) =

∑m i=

(xi −yi )^2 σ i^2 where^ σi^ is the standard deviation of^ xi Can be approximated with eigenvectors of C Used for learning distance metric

Factor analysis

A generative dimensionality reduction algorithm Let x ∈ IRm^ and z ∈ IRd^ , x is modeled by z, dubbed as factors (d < m) x = Λz + ε

I (^) Λ is factor loading matrix I (^) z is assumed be N(0, I ) distributed (zero mean, unit variance normals) I (^) The factors z model correlation between the elements of x I (^) ε is a random variable to account for noise and assumed to be distributed with N(0, Ψ) where Ψ is a diagonal matrix (whereas PCA uses an isotropic error model with ψi = σ^2 ) I (^) ε accounts for independent noise in each element of x I (^) The diagonality of Ψ is a key assumption: constraining the error covariance Ψ for estimation I (^) The observed variable, xi , are conditionally independent given the factors z I (^) x is N(0, ΛΛ>^ + Ψ) distributed (whereas PCA models with N(0, ΛΛ>^ + σ^2 I )

Properties of factor analysis

Factor analysis: x = Λz + ε Latent variables z: explain correlations between x εi represents variability unique to a particular xi Differ from PCA which treats covariance and variance identically Want to infer Λ and Ψ from x Suppose Λ and Ψ are known, by linear projection E [z|x] = βx where β = Λ>(Ψ + ΛΛ>)−^1 , since the joint Gaussian of data x and factors z: p(

[

x z

]

) = N(

[

]

[

ΛΛ>^ + Ψ Λ

Λ>^ I

]

EM algorithm for factor analysis

Expectation-Maximization: useful technique for dealing with missing data Start with some initial guess of missing data and evaluate the expected values Optimize the missing parameters by taking derivate of likelihood of observed and missing data w.r.t. parameters Repeat until the data likelihood does not change E-step: Given Λ and Ψ, for each data point xi , compute E [z|x] = βx E [zz>|x] = Var (z|x) + E [z|x]E [z|x]> = I − βΛ + βxx>β> M-step: Λnew^ = (

∑n i=1 xi^ E^ [z|xi^ ]

)(∑n i=1 E^ [zz |xi ])− 1 Ψnew^ = (^1) n diag{

∑n i=1 xi^ x

i −^ Λ new (^) E [z|xi ]x> i } where diag operator sets all off-diagonal elements to zero

FA and PCA

Factor analysis provides a proper probabilistic model PCA is rotationally invariant; FA is not Given a set of data points, would Λ correspond to orthonormal basis of a PCA subspace? No, in most cases However, Λ corresponds to orthonormal basis if FA has isotropic error model, i.e., ψi = σ^2

Probabilistic principal component analysis (cont’d)

Maximize log likelihood with the EM algorithm,

Λ = U(Σ − σ^2 I )^1 /^2 R

I (^) Um×d is the first d eigenvectors computed from covariance matrix S I (^) Σd×d is a diagonal matrix corresponding to the first d eigenvalues, λi I (^) Rd×d is an arbitrary orthogonal rotation matrix (note z has a uniform Gaussian distribution) I (^) The noise variance σ^2 is the residual variance per dimension

σ^2 = 1 m − d

∑^ m i=d+

λi

Big picture

from “A unifying review of linear Gaussian models” by Zoubin Ghahramani and Sam Roweis