Numerical Linear Algebra for Data Exploration: Sensor Decomposition | CBS 598, Study notes of Algorithms and Programming

Material Type: Notes; Class: Topic: Survey of Bioscience Business Sectors; Subject: Computational Biosciences; University: Arizona State University - Tempe; Term: Fall 2007;

Typology: Study notes

Pre 2010

Uploaded on 09/02/2009

koofers-user-p7c
koofers-user-p7c 🇺🇸

9 documents

1 / 2

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
CSE 494 CSE/CBS 598 (Fall 2007): Numerical Linear Algebra for Data
Exploration— Tensor Decomposition
Instructor: Jieping Ye
1 Introduction
Principal component analysis (PCA) reduces the dimensionality of a data set by finding a
new set of variables, smaller than the original set of variables. It retains most of the sample’s
variance.
It is useful for the compression and classification of data.
The new variables, called principal components (PCs), are uncorrelated, and are ordered by
the fraction of the total information each retains.
2 Computation of PCs
Let Abe a n×mdata matrix in which each row is a data vector, each column represents a
variable. Ais centered: the estimated mean is subtracted from each row, so Ais of zero row
mean.
Let wbe the column vector in IRmof projection weights that result in the largest variance
when the data Ais projected along w. We require wTw= 1.
Projection of a vector uonto wis wTu=Pm
j=1 ujwj. Thus, pro jection of data along wis
Aw.
The variance of the data after the projection by wis
(Aw)T(Aw) = wTATAw =wTC w
where C=ATAis the covariance matrix of the data (note that Ais centered).
Compute the optimal projection w, which maximizes the variance subject to constraint
wTw= 1.
Define the following Lagrange function:
f=wTCw λ(wTw1)
where λis the Lagrange multiplier.
Take the directive of fin terms of wand set it to zero, we get
∂f
∂w = 2C w 2λw = 0.
This leads to the eigenvalue problem: Cw =λw, where C=ATA.
Show that the solution is given by the singular values and singular vectors of A.
pf2

Partial preview of the text

Download Numerical Linear Algebra for Data Exploration: Sensor Decomposition | CBS 598 and more Study notes Algorithms and Programming in PDF only on Docsity!

CSE 494 CSE/CBS 598 (Fall 2007): Numerical Linear Algebra for Data

Exploration— Tensor Decomposition

Instructor: Jieping Ye

1 Introduction

  • Principal component analysis (PCA) reduces the dimensionality of a data set by finding a new set of variables, smaller than the original set of variables. It retains most of the sample’s variance.
  • It is useful for the compression and classification of data.
  • The new variables, called principal components (PCs), are uncorrelated, and are ordered by the fraction of the total information each retains.

2 Computation of PCs

  • Let A be a n × m data matrix in which each row is a data vector, each column represents a variable. A is centered: the estimated mean is subtracted from each row, so A is of zero row mean.
  • Let w be the column vector in IRm^ of projection weights that result in the largest variance when the data A is projected along w. We require wT^ w = 1.
  • Projection of a vector u onto w is wT^ u =

∑m j=1 uj^ wj^.^ Thus, projection of data along^ w^ is Aw.

  • The variance of the data after the projection by w is

(Aw)T^ (Aw) = wT^ AT^ Aw = wT^ Cw

where C = AT^ A is the covariance matrix of the data (note that A is centered).

  • Compute the optimal projection w, which maximizes the variance subject to constraint wT^ w = 1.
  • Define the following Lagrange function:

f = wT^ Cw − λ(wT^ w − 1)

where λ is the Lagrange multiplier.

  • Take the directive of f in terms of w and set it to zero, we get

∂f ∂w

= 2Cw − 2 λw = 0.

This leads to the eigenvalue problem: Cw = λw, where C = AT^ A.

  • Show that the solution is given by the singular values and singular vectors of A.
  • We aim to maximize the variance, which is given by wT^ Cw = wT^ λw = λwT^ w = λ. Choose λ = σ^21. More precisely: the first principal component of A is exactly the first right singular vector v 1 of A.
  • Once the first principal component is found, we continue in the same fashion to look for the next one, which is orthogonal to (all) the principal component(s) already found. - The solutions are the right singular vectors vk of A, and the variance in each direction is given by the corresponding singular values σk.
  • It is natural to first form the covariance matrix C = AT^ A of the centered data matrix A for computing the PCs. However, this is not a good idea. Why? - The condition number of AT^ A is much larger than that of A. - For a sparse A, C = AT^ A may not be sparse any more.
  • How to compute the PCA:
    • Center the data by subtracting the mean of the rows.
    • Compute the SVD of the centered data matrix A as A = U ΣV T^.
    • The principal components are the columns of V , the coordinates of the data in the basis defined by the principal components are U Σ.

3 How to choose the number of PCs

  • The variance in the direction of the k-th principal component is given by the corresponding singular value: σ k^2.
  • Singular values can be used to estimate how many principal components to keep.
  • Rule of thumb: keep enough to explain 85% of the variation: ∑k j=1 σ 2 ∑^ j n j=1 σ 2 j
  • Typically, the squared singular values drop rapidly. Thus, the first few principal components are enough to capture most of the variation in the data. This leads to data compression.

4 Summary

  • PCA is SVD done on centered data.
  • PCA looks for such a direction that the data projected onto it has maximal variance.
  • When found, PCA continues by seeking the next direction, which is orthogonal to all the previously found directions, and which explains as much of the remaining variance in the data as possible.
  • PCA is useful for data exploration, visualizing data, compressing data, outlier detection, and etc.