Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Principal components analysis - Lectures Notes - 11, Study notes of Machine Learning

Stanford University Machine Learning

Artificial Intelligence. Lectures Notes of Machine Learning. Prof. Andrew Ng - Stanford University - Contents: Principal components analysis

Typology: Study notes

2010/2011

Uploaded on 10/30/2011

ilyastrab 🇺🇸

4.4

(52)

379 documents

1 / 6

This page cannot be seen from the preview

Don't miss anything!

CS229 Lecture notes

Andrew Ng

Part XI

Principal components analysis

In our discussion of factor analysis, we gave a way to model data x∈Rnas

“approximately” lying in some k-dimension subspace, where kn. Specif-

ically, we imagined that each point x(i)was created by first generating some

z(i)lying in the k-dimension affine space {Λz+µ;z∈Rk}, and then adding

Ψ-covariance noise. Factor analysis is based on a probabilistic model, and

parameter estimation used the iterative EM algorithm.

In this set of notes, we will develop a method, Principal Components

Analysis (PCA), that also tries to identify the subspace in which the data

approximately lies. However, PCA will do so more directly, and will require

only an eigenvector calculation (easily done with the eig function in Matlab),

and does not need to resort to EM.

Suppose we are given dataset {x(i);i= 1,...,m}of attributes of mdif-

ferent types of automobiles, such as their maximum speed, turn radius, and

so on. Lets x(i)∈Rnfor each i(nm). But unknown to us, two different

attributes—some xiand xj—respectively give a car’s maximum speed mea-

sured in miles per hour, and the maximum speed measured in kilometers per

hour. These two attributes are therefore almost linearly dependent, up to

only small differences introduced by rounding off to the nearest mph or kph.

Thus, the data really lies approximately on an n−1 dimensional subspace.

How can we automatically detect, and perhaps remove, this redundancy?

For a less contrived example, consider a dataset resulting from a survey of

pilots for radio-controlled helicopters, where x(i)

1is a measure of the piloting

skill of pilot i, and x(i)

2captures how much he/she enjoys flying. Because

RC helicopters are very difficult to fly, only the most committed students,

ones that truly enjoy flying, become good pilots. So, the two attributes

x1and x2are strongly correlated. Indeed, we might posit that that the

1

Discover Study notes of Machine Learning Stanford University

Partial preview of the text

Download Principal components analysis - Lectures Notes - 11 and more Study notes Machine Learning in PDF only on Docsity!

CS229 Lecture notes

Andrew Ng

Part XI

Principal components analysis

In our discussion of factor analysis, we gave a way to model data x ∈ Rn^ as “approximately” lying in some k-dimension subspace, where k n. Specif- ically, we imagined that each point x(i)^ was created by first generating some z(i)^ lying in the k-dimension affine space {Λz + μ; z ∈ Rk}, and then adding Ψ-covariance noise. Factor analysis is based on a probabilistic model, and parameter estimation used the iterative EM algorithm. In this set of notes, we will develop a method, Principal Components Analysis (PCA), that also tries to identify the subspace in which the data approximately lies. However, PCA will do so more directly, and will require only an eigenvector calculation (easily done with the eig function in Matlab), and does not need to resort to EM. Suppose we are given dataset {x(i); i = 1,... , m} of attributes of m dif- ferent types of automobiles, such as their maximum speed, turn radius, and so on. Lets x(i)^ ∈ Rn^ for each i (n m). But unknown to us, two different attributes—some xi and xj —respectively give a car’s maximum speed mea- sured in miles per hour, and the maximum speed measured in kilometers per hour. These two attributes are therefore almost linearly dependent, up to only small differences introduced by rounding off to the nearest mph or kph. Thus, the data really lies approximately on an n − 1 dimensional subspace. How can we automatically detect, and perhaps remove, this redundancy? For a less contrived example, consider a dataset resulting from a survey of pilots for radio-controlled helicopters, where x( 1 i )is a measure of the piloting

skill of pilot i, and x( 2 i ) captures how much he/she enjoys flying. Because RC helicopters are very difficult to fly, only the most committed students, ones that truly enjoy flying, become good pilots. So, the two attributes x 1 and x 2 are strongly correlated. Indeed, we might posit that that the

data actually likes along some diagonal axis (the u 1 direction) capturing the intrinsic piloting “karma” of a person, with only a small amount of noise lying off this axis. (See figure.) How can we automatically compute this u 1 direction?

x 1

x^2

(enjoyment)

(skill)

1

u

2

We will shortly develop the PCA algorithm. But prior to running PCA per se, typically we first pre-process the data to normalize its mean and variance, as follows:

Let μ = (^) m^1

∑m i=1 x (i).

Replace each x(i)^ with x(i)^ − μ.
Let σ j^2 = (^) m^1

i(x

(i) j )

2

Replace each x( ji )with x( ji )/σj.

Steps (1-2) zero out the mean of the data, and may be omitted for data known to have zero mean (for instance, time series corresponding to speech or other acoustic signals). Steps (3-4) rescale each coordinate to have unit variance, which ensures that different attributes are all treated on the same “scale.” For instance, if x 1 was cars’ maximum speed in mph (taking values in the high tens or low hundreds) and x 2 were the number of seats (taking values around 2-4), then this renormalization rescales the different attributes to make them more comparable. Steps (3-4) may be omitted if we had apriori knowledge that the different attributes are all on the same scale. One

^

We see that the projected data still has a fairly large variance, and the points tend to be far from zero. In contrast, suppose had instead picked the following direction:

!!!!!!!!!!!!

!!!!!!

""""""""""""

""""""

################

$$$$$$$$$$$$$$$$

%%%%%%%%%%%%%%%%

&&&&&&&&&&&&&&&&

&&&&&&&&&&&&&&&& ''''''''''

''''''''''

'''''

((((((((((

(((((

Here, the projections have a significantly smaller variance, and are much closer to the origin. We would like to automatically select the direction u corresponding to the first of the two figures shown above. To formalize this, note that given a

unit vector u and a point x, the length of the projection of x onto u is given by xT^ u. I.e., if x(i)^ is a point in our dataset (one of the crosses in the plot), then its projection onto u (the corresponding circle in the figure) is distance xT^ u from the origin. Hence, to maximize the variance of the projections, we would like to choose a unit-length u so as to maximize:

1 m

∑^ m

i=

(x(i) T u)^2 =

m

∑^ m

i=

uT^ x(i)x(i) T u

= uT

m

∑^ m

i=

x(i)x(i) T

u.

We easily recognize that the maximizing this subject to ||u|| 2 = 1 gives the

principal eigenvector of Σ = (^) m^1

∑m i=1 x (i)x(i)T^ , which is just the empirical

covariance matrix of the data (assuming it has zero mean).^1 To summarize, we have found that if we wish to find a 1-dimensional subspace with with to approximate the data, we should choose u to be the principal eigenvector of Σ. More generally, if we wish to project our data into a k-dimensional subspace (k < n), we should choose u 1 ,... , uk to be the top k eigenvectors of Σ. The ui’s now form a new, orthogonal basis for the data.^2 Then, to represent x(i)^ in this basis, we need only compute the corre- sponding vector

y(i)^ =

uT 1 x(i) uT 2 x(i) .. . uTk x(i)

∈ Rk.

Thus, whereas x(i)^ ∈ Rn, the vector y(i)^ now gives a lower, k-dimensional, approximation/representation for x(i). PCA is therefore also referred to as a dimensionality reduction algorithm. The vectors u 1 ,... , uk are called the first k principal components of the data.

Remark. Although we have shown it formally only for the case of k = 1, using well-known properties of eigenvectors it is straightforward to show that

(^1) If you haven’t seen this before, try using the method of Lagrange multipliers to max- imize uT^ Σu subject to that uT^ u = 1. You should be able to show that Σu = λu, for some λ, which implies u is an eigenvector of Σ, with eigenvalue λ. (^2) Because Σ is symmetric, the ui’s will (or always can be chosen to be) orthogonal to

each other.

Principal components analysis - Lectures Notes - 11, Study notes of Machine Learning

Related documents

Partial preview of the text

Download Principal components analysis - Lectures Notes - 11 and more Study notes Machine Learning in PDF only on Docsity!

CS229 Lecture notes

Andrew Ng

Part XI

Principal components analysis