Multi-dimensional Scaling (MDS) Technique: Visualizing High-dimensional Data - Prof. S. Zh, Study notes of Statistics

Multi-dimensional scaling (mds) is a technique used to project high-dimensional data into lower dimensions while preserving the spatial distances between data points. This method is useful for understanding the structures and properties of data, as well as verifying distance measures for unknown datasets. Mds can be applied to various domains, such as image databases, art authentication, and color mapping. Examples and explanations of mds, including its objective, applications, and algorithms.

Typology: Study notes

Pre 2010

Uploaded on 09/17/2009

koofers-user-2ks
koofers-user-2ks 🇺🇸

10 documents

1 / 8

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1
Lecture 7:
Multi-dimensional scaling (MDS)
MDS i t h i ti t d b 2
bl i d t di d t i hi h di i l
MDS i
s
a
t
ec
h
n
i
que
mo
ti
va
t
e
d b
y
2
-pro
bl
ems
i
n
un
d
ers
t
an
di
ng
d
a
t
a
i
n
hi
g
h di
mens
i
ona
l
spaces.
Its objective is to project an ensemble of data points into 1, 2, or 3-dimensional spaces so that
the spatial distance of these data points are preserved.
Thus, MDS is used for two purposes:
1). Visualize the structures and properties of data, so that we may select proper models for them.
2) Verify some distance (metric) measure on some unknown dataset
Lecture note for Stat 231: Pattern Recognition and Machine Learning
2)
.
Verify some distance (metric) measure on some unknown dataset
.
With a good distance measure, the data clusters should correspond to meaningful set,
e.g., in image database retrieval, or art authentication.
Example I: distance visualization
Lecture note for Stat 231: Pattern Recognition and Machine Learning
pf3
pf4
pf5
pf8

Partial preview of the text

Download Multi-dimensional Scaling (MDS) Technique: Visualizing High-dimensional Data - Prof. S. Zh and more Study notes Statistics in PDF only on Docsity!

Lecture 7:

Multi-dimensional scaling (MDS)

MDS iMDS is a technique motivated by 2-problems in understanding data in high dimensional spaces. t h i ti t d b 2 bl i d t di d t i hi h di i l Its objective is to project an ensemble of data points into 1, 2, or 3-dimensional spaces so that the spatial distance of these data points are preserved.

Thus, MDS is used for two purposes:

1). Visualize the structures and properties of data, so that we may select proper models for them.

  1. Verify some distance (metric) measure on some unknown dataset

Lecture note for Stat 231: Pattern Recognition and Machine Learning

2). Verify some distance (metric) measure on some unknown dataset. With a good distance measure, the data clusters should correspond to meaningful set, e.g., in image database retrieval, or art authentication.

Example I: distance visualization

Lecture note for Stat 231: Pattern Recognition and Machine Learning

Reconstructed 2D Map

One computes the (x,y) coordinates for the 10 cities that best preserve the distance matrix.

Lecture note for Stat 231: Pattern Recognition and Machine Learning

Example II: color mapping

Another example is to map various colors in a 2D matrix so that some perceptual distances are preserved. I am sorry that we cannot print out color, but the pdf file will be in color. One can calculate a perceptual color distance by psychology experiments then obtainsOne can calculate a perceptual color distance by psychology experiments, then obtains a distance matrix, like the city matrix, then we can map colors in 2D

Lecture note for Stat 231: Pattern Recognition and Machine Learning

Example V: Art Authentication

Lecture note for Stat 231: Pattern Recognition and Machine Learning

S. Lyu, D. Rockmore, and H. Farid, PNAS, 2004

Basic idea of MDS

Given: a set of data points in d-space {x 1 , x 2 , …, x (^) n } a dissimilarity / distance measure/metric between two points x (^) i, x (^) j: δij

Objective: find points in 1,2, or 3-space {y 1 , y 2 , …, y (^) n } with usually Eclidean distances dij for two points y (^) i and y (^) j.

A criterion (Kruskal 1964) is to minimize

ij

dij ij

2

,

( )^2

Stress

Lecture note for Stat 231: Pattern Recognition and Machine Learning

ij

ij ,

Senator map by MDS

Lecture note for Stat 231: Pattern Recognition and Machine Learning

MDS for non-metric data

In some applications, the quantitative distance or dissimilarity is less important than the rank order. Thus an MDS mapping criterion will be a monotonic constraint that the project points preserve the rank order of the original data points.g p

Suppose we re-order the m=n(n-1)/2 distance in the original data

For any m numbers that preserve the monotonic constraints,

Lecture note for Stat 231: Pattern Recognition and Machine Learning

We define a criterion for the projected points as,

Kolmogorov Capacity Dimension

Let N(e) be the minimum e-cover of the dataset D, we define a Kolmogorov capacity dimension (or Box counting dimension) by

In other word, the number (volume) has an exponential rate

Or we have a linear relation in a log-log plot

Lecture note for Stat 161: Introduction to Pattern Recognition and Machine Learning

O e a e a ea e at o a og og p ot

Information Dimension

The capacity dimension assumes a uniform probability for each ball. If this is not uniform, we have a modified version called the information dimension,

Where It is easy to check that

Lecture note for Stat 161: Introduction to Pattern Recognition and Machine Learning

Theorem:

Correlation dimension

Given N data points,

The correlation dimension is

Lecture note for Stat 161: Introduction to Pattern Recognition and Machine Learning

Intuitively, the higher dimension the manifold is, the more neighbors a point will have.