



















Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
These lecture slides are very easy to understand and very helpful to built a concept about the Matrix computation.The key points discuss in these slides are:Multidimensional Scaling, Spectral Methods, Dimensionality Reduction, Spectral Graph Theory, Spectral Clustering, Random Walk, Principal Component Analysis, Euclidean Distance, Gram Matrix, Non-Zero Singular Values
Typology: Slides
1 / 27
This page cannot be seen from the preview
Don't miss anything!




















Multidimensional scaling Spectral methods for dimensionality reduction Spectral graph theory Spectral clustering Random walk
Compute the low dimensional representation φ ∈ IRq^ of a high dimensional data x ∈ IRm^ that most faithfully preserves pairwise distances (or similarities which are inversely proportional to distances) Euclidean distance between two points dij = ‖xi − xj ‖^22 = (xi − xj )>(xi − xj ) The solution is obtained by minimizing
EMDS =
i
j
(xi · xj − φi · φj )^2
and the minimum error is obtained from the spectral decomposition of the n × n Gram matrix of inner products G = X >X , Gij = xi · xj Denoting the top q eigenvectors of the Gram matrix by {uα}m α=1 and their respective eigenvalues by {λα}m α=1, the outputs of MDS are given by φiα =
λαuαi
Assume the centroid of the configuration of n points is at the origin ∑^ n
i=
xij = 0, j = 1,... , m
To find the Gram matrix G, from dij = (xi − xj )>(xi − xj ) = x> i xi + x> j xj − 2 x> i xj , and hence 1 n
∑n i=1 d
2 ij =^
1 n
∑n i=1 x
i xi^ +^ x
j xj 1 n
∑n j=1 d
2 ij =^ x
i xi^ +^
1 n
∑n j=1 x
j xj 1 n^2
∑n i=
∑n j=1 d
2 ij =^
2 n
∑n i=1 x
i xi Gij = x> i xj = − 12 (d ij^2 − (^1) n
∑n i=1 d
2 ij −^
1 n
∑n j=1 d
2 ij +^
1 n^2
∑n i=
∑n j=1 d
2 ij ) = aij − ai. − a.j + a.. where ai. =
n
j=
aij , a.j =
n
i=
aij , a.. =
n^2
i
j
aij
Compute the low dimensional representation of a high dimensional data set that most faithfully preserves the pairwise geodesic distance [Tenenbaum et al. Science 00] Geodesic distances are approximated as measured along the submanifold from which the data points are sampled Can be understood as a variant of MDS in which estimates of geodesic distances along the submanifold are substituted (instead of Euclidean distance) Main steps: (^1) Construct adjacency graph: Find neighbors using K nearest neighbor or distance (^2) Estimate geodesic distance: Compute pairwise shortest distance using dynamic programming (^3) Metric MDS: Uncover the embedding from the top d eigenvectors of Gram matrix
Analyze graph structure and properties using linear algebra, i.e., the study of eigenvalues and eigenvectors of matrices associated graphs Related to random walk Applications: spectral clustering, shape matching, mesh compression, PageRank, etc. Given a graph G = (V , E ) and its weighted adjacency matrix W , we compute a diagonal matrix D
Dii =
j
Wij
the graph Laplacian is D − W and normalized graph Laplacian is
L = D−^1 /^2 (D − W )D−^1 /^2 = I − D−^1 /^2 WD−^1 /^2
Algorithm: Given n points in IRm^ [Belkin and Niyogi NIPS 02] (^1) Constructing the graph: nodes i and j are connected by an edge if ||xi − xj ||^2 < or based on K nearest neighbors (^2) Choosing the weights: compute the weighted graph
Wij = e−^
||xi −xj ||^2 t
where t is the kernel width (i.e., heat kernel) (^3) Compute Laplacian eigenmap: Assume G is connected, otherwise apply this step to each component Compute eigenvalues and eigenvectors for the generalized eigenvalue problem: Ly = λDy where D is the diagonal matrix and L = D − W is the graph Laplacian matrix Let y 0 , y 1 , yk− 1 be the eigenvectors, ordered ascendingly to their eigenvalues. The image of xi under the embedding into the lower dimensional space Rm^ is given by (y 1 (i),... , ym(i)).
Let M be a Riemannian manifold (isometrically) embedded in IRm For a differential map f : M → IR
|f (x′) − f (x)| ∼ ‖∇f (x)‖ · dM(x, x′) + O(dM(x, x′))
The geodesic distance on M and the ambient Euclidean distance are locally similar
dM(x, x′) = ‖x − x′‖ + O(‖x − x′‖)
Choose f to preserve distance by minimizing ∫
M
‖∇f (x)‖^2 dx subject to ‖f ‖L (^2) (M) = 1, 〈f , 1 〉L (^2) (M) = 0
where dx is the uniform measure on M Minimizing
M ‖∇f^ (x)‖
(^2) corresponds to minimizing Lf = 1 ∑^2 ij (fi^ −^ fj^ )
(^2) Wij on a graph, i.e., finding eigenfunctions of the Laplace-Beltrami operator L
See Tommi Jaakkola’s lecture notes on spectral clustering Unified view of existing algorithms: [Weiss ICCV 99] I (^) Feature grouping [Scott and Longuet-Higgins BMVC 90] I (^) Multibody factorization [Costeria and Kanade ICCV 95] I (^) Image segmentation [Shi and Malik CVPR 97] I (^) Grouping [Perona and Freeman ECCV 98] Analysis of spectral clustering: [Ng et al. NIPS 01] [Kannan et al. JACM 04] Image segmentation: [Shi and Malik CVPR 97] [Meila and Shi NIPS 01] See also semi-supervised learning with spectral graph
Given an undirected weighted graph G = (V , E , W ), the random walk on the graph is given by the transition matrix
P = D−^1 W (1)
where D is a diagonal matrix
Dii =
j
Wij
Normalized graph Laplacian
L = D−^1 /^2 (D − W )D−^1 /^2 = I − D−^1 /^2 WD−^1 /^2 (2)
The random walk matrix has the same eigenvalues as I − L
D−^1 W = D−^1 /^2 (D−^1 /^2 WD−^1 /^2 )D^1 /^2 = D−^1 /^2 (I − L)D^1 /^2 (3)
In matrix form
π =
π 1 π 2 .. . πn
π = Pπ, π>e = 1 where e = [1 1... 1]> Can be viewed as random walk or Markov chain π(1)^ = Pπ(0) π(2)^ = Pπ(1)^ = P^2 π(0)
... The transition matrix after t-step converges P(t)^ = P(t−1)P = P(t−2)P^2 =... Find the stationary distribution of P as t → ∞ by solving the homogeneous linear system π(I − P) = 0
The dominant eigenvector is the PageRank vector Random surfer: M =
1 − c N
ee>^ + cP
where c is a damping factor to account for whether a surfer follows a link or not (empirically set to 0.85 by Page and Brim) The PR values are the entries of the dominant (i.e., first) eigenvector of the modified transition matrix M
π = Mπ =
1 − c N
ee>π + cPπ =
1 − c N
e + cPπ
The world’s largest matrix computation! Solved by power iteration See “An eigenvector based ranking approach for hypertext” [Page and Brim SIGIR 98]
Find a low dimensional representation that most faithfully preserves the distance and angles between nearby input data points [Weinberger and Saul CVPR 04] (^1) First find k-nearest neighbors of each input data point. Denote ηij = 1 if xi and xj are neighbors (^2) The constraints to preserve distances and angles between k nearest neighbors are ‖φi − φj ‖^2 = ‖xi − xj ‖^2 for all ηij = 1, x ∈ IRm^ and φ ∈ IRq To eliminate a translational degree of freedom ∑
i
φi = 0, φi ∈ Rq
(^3) Unfold the input data points by maximizing the variance of the outputs
var(φ) =
∑
i
‖φi ‖^2
The optimization problem is formulated as a semi-definite programming problem
Solving
max
ij ‖φi^ −^ φj^ ‖
2 subject to
i φi = 0 ‖φi − φj ‖^2 = Dij for all (i, j) whose ηij = 1
The above optimization problem is not convex as it involves maximizing a quadratic function with quadratic equality constraints Reformulate the problem to a convex one Let Kij = φi · φj denote the Gram matrix of the outputs, the semidefinite program is
max tr(K ) subject to∑ K 0 i
j Kij^ = 0 Kii − 2 Kij + Kjj = ‖xi − xj ‖^2 for all (i, j) whose ηij = 1