Multidimensional Scaling - Matrix Computation - Lecture Slides, Slides of Advanced Computer Architecture

These lecture slides are very easy to understand and very helpful to built a concept about the Matrix computation.The key points discuss in these slides are:Multidimensional Scaling, Spectral Methods, Dimensionality Reduction, Spectral Graph Theory, Spectral Clustering, Random Walk, Principal Component Analysis, Euclidean Distance, Gram Matrix, Non-Zero Singular Values

Typology: Slides

2012/2013

Uploaded on 04/27/2013

ashalata
ashalata 🇮🇳

3.8

(18)

106 documents

1 / 27

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Lecture 23
1 / 27
Docsity.com
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b

Partial preview of the text

Download Multidimensional Scaling - Matrix Computation - Lecture Slides and more Slides Advanced Computer Architecture in PDF only on Docsity!

  • Lecture

Overview

Multidimensional scaling Spectral methods for dimensionality reduction Spectral graph theory Spectral clustering Random walk

Multidimensional scaling (MDS)

Compute the low dimensional representation φ ∈ IRq^ of a high dimensional data x ∈ IRm^ that most faithfully preserves pairwise distances (or similarities which are inversely proportional to distances) Euclidean distance between two points dij = ‖xi − xj ‖^22 = (xi − xj )>(xi − xj ) The solution is obtained by minimizing

EMDS =

i

j

(xi · xj − φi · φj )^2

and the minimum error is obtained from the spectral decomposition of the n × n Gram matrix of inner products G = X >X , Gij = xi · xj Denoting the top q eigenvectors of the Gram matrix by {uα}m α=1 and their respective eigenvalues by {λα}m α=1, the outputs of MDS are given by φiα =

λαuαi

MDS: derivation

Assume the centroid of the configuration of n points is at the origin ∑^ n

i=

xij = 0, j = 1,... , m

To find the Gram matrix G, from dij = (xi − xj )>(xi − xj ) = x> i xi + x> j xj − 2 x> i xj , and hence 1 n

∑n i=1 d

2 ij =^

1 n

∑n i=1 x

i xi^ +^ x

j xj 1 n

∑n j=1 d

2 ij =^ x

i xi^ +^

1 n

∑n j=1 x

j xj 1 n^2

∑n i=

∑n j=1 d

2 ij =^

2 n

∑n i=1 x

i xi Gij = x> i xj = − 12 (d ij^2 − (^1) n

∑n i=1 d

2 ij −^

1 n

∑n j=1 d

2 ij +^

1 n^2

∑n i=

∑n j=1 d

2 ij ) = aij − ai. − a.j + a.. where ai. =

n

j=

aij , a.j =

n

i=

aij , a.. =

n^2

i

j

aij

Isometric mapping (Isomap)

Compute the low dimensional representation of a high dimensional data set that most faithfully preserves the pairwise geodesic distance [Tenenbaum et al. Science 00] Geodesic distances are approximated as measured along the submanifold from which the data points are sampled Can be understood as a variant of MDS in which estimates of geodesic distances along the submanifold are substituted (instead of Euclidean distance) Main steps: (^1) Construct adjacency graph: Find neighbors using K nearest neighbor or  distance (^2) Estimate geodesic distance: Compute pairwise shortest distance using dynamic programming (^3) Metric MDS: Uncover the embedding from the top d eigenvectors of Gram matrix

Spectral graph theory

Analyze graph structure and properties using linear algebra, i.e., the study of eigenvalues and eigenvectors of matrices associated graphs Related to random walk Applications: spectral clustering, shape matching, mesh compression, PageRank, etc. Given a graph G = (V , E ) and its weighted adjacency matrix W , we compute a diagonal matrix D

Dii =

j

Wij

the graph Laplacian is D − W and normalized graph Laplacian is

L = D−^1 /^2 (D − W )D−^1 /^2 = I − D−^1 /^2 WD−^1 /^2

Laplacian eigenmap

Algorithm: Given n points in IRm^ [Belkin and Niyogi NIPS 02] (^1) Constructing the graph: nodes i and j are connected by an edge if ||xi − xj ||^2 <  or based on K nearest neighbors (^2) Choosing the weights: compute the weighted graph

Wij = e−^

||xi −xj ||^2 t

where t is the kernel width (i.e., heat kernel) (^3) Compute Laplacian eigenmap: Assume G is connected, otherwise apply this step to each component Compute eigenvalues and eigenvectors for the generalized eigenvalue problem: Ly = λDy where D is the diagonal matrix and L = D − W is the graph Laplacian matrix Let y 0 , y 1 , yk− 1 be the eigenvectors, ordered ascendingly to their eigenvalues. The image of xi under the embedding into the lower dimensional space Rm^ is given by (y 1 (i),... , ym(i)).

Corresponding continuum model

Let M be a Riemannian manifold (isometrically) embedded in IRm For a differential map f : M → IR

|f (x′) − f (x)| ∼ ‖∇f (x)‖ · dM(x, x′) + O(dM(x, x′))

The geodesic distance on M and the ambient Euclidean distance are locally similar

dM(x, x′) = ‖x − x′‖ + O(‖x − x′‖)

Choose f to preserve distance by minimizing ∫

M

‖∇f (x)‖^2 dx subject to ‖f ‖L (^2) (M) = 1, 〈f , 1 〉L (^2) (M) = 0

where dx is the uniform measure on M Minimizing

M ‖∇f^ (x)‖

(^2) corresponds to minimizing Lf = 1 ∑^2 ij (fi^ −^ fj^ )

(^2) Wij on a graph, i.e., finding eigenfunctions of the Laplace-Beltrami operator L

Spectral clustering

See Tommi Jaakkola’s lecture notes on spectral clustering Unified view of existing algorithms: [Weiss ICCV 99] I (^) Feature grouping [Scott and Longuet-Higgins BMVC 90] I (^) Multibody factorization [Costeria and Kanade ICCV 95] I (^) Image segmentation [Shi and Malik CVPR 97] I (^) Grouping [Perona and Freeman ECCV 98] Analysis of spectral clustering: [Ng et al. NIPS 01] [Kannan et al. JACM 04] Image segmentation: [Shi and Malik CVPR 97] [Meila and Shi NIPS 01] See also semi-supervised learning with spectral graph

Normalized graph Laplacian and random walk

Given an undirected weighted graph G = (V , E , W ), the random walk on the graph is given by the transition matrix

P = D−^1 W (1)

where D is a diagonal matrix

Dii =

j

Wij

Normalized graph Laplacian

L = D−^1 /^2 (D − W )D−^1 /^2 = I − D−^1 /^2 WD−^1 /^2 (2)

The random walk matrix has the same eigenvalues as I − L

D−^1 W = D−^1 /^2 (D−^1 /^2 WD−^1 /^2 )D^1 /^2 = D−^1 /^2 (I − L)D^1 /^2 (3)

PageRank algorithm (cont’d)

In matrix form

π =

π 1 π 2 .. . πn

π = Pπ, π>e = 1 where e = [1 1... 1]> Can be viewed as random walk or Markov chain π(1)^ = Pπ(0) π(2)^ = Pπ(1)^ = P^2 π(0)

... The transition matrix after t-step converges P(t)^ = P(t−1)P = P(t−2)P^2 =... Find the stationary distribution of P as t → ∞ by solving the homogeneous linear system π(I − P) = 0

PageRank algorithm (cont’d)

The dominant eigenvector is the PageRank vector Random surfer: M =

1 − c N

ee>^ + cP

where c is a damping factor to account for whether a surfer follows a link or not (empirically set to 0.85 by Page and Brim) The PR values are the entries of the dominant (i.e., first) eigenvector of the modified transition matrix M

π = Mπ =

1 − c N

ee>π + cPπ =

1 − c N

e + cPπ

The world’s largest matrix computation! Solved by power iteration See “An eigenvector based ranking approach for hypertext” [Page and Brim SIGIR 98]

Maximum variance unfolding

Find a low dimensional representation that most faithfully preserves the distance and angles between nearby input data points [Weinberger and Saul CVPR 04] (^1) First find k-nearest neighbors of each input data point. Denote ηij = 1 if xi and xj are neighbors (^2) The constraints to preserve distances and angles between k nearest neighbors are ‖φi − φj ‖^2 = ‖xi − xj ‖^2 for all ηij = 1, x ∈ IRm^ and φ ∈ IRq To eliminate a translational degree of freedom ∑

i

φi = 0, φi ∈ Rq

(^3) Unfold the input data points by maximizing the variance of the outputs

var(φ) =

i

‖φi ‖^2

The optimization problem is formulated as a semi-definite programming problem

Maximum variance unfolding (cont’d)

Solving

max

ij ‖φi^ −^ φj^ ‖

2 subject to

i φi = 0 ‖φi − φj ‖^2 = Dij for all (i, j) whose ηij = 1

The above optimization problem is not convex as it involves maximizing a quadratic function with quadratic equality constraints Reformulate the problem to a convex one Let Kij = φi · φj denote the Gram matrix of the outputs, the semidefinite program is

max tr(K ) subject to∑ K  0 i

j Kij^ = 0 Kii − 2 Kij + Kjj = ‖xi − xj ‖^2 for all (i, j) whose ηij = 1