




Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Multi-dimensional scaling (mds) is a technique used to project high-dimensional data into lower dimensions while preserving the spatial distances between data points. This method is useful for understanding the structures and properties of data, as well as verifying distance measures for unknown datasets. Mds can be applied to various domains, such as image databases, art authentication, and color mapping. Examples and explanations of mds, including its objective, applications, and algorithms.
Typology: Study notes
1 / 8
This page cannot be seen from the preview
Don't miss anything!





MDS iMDS is a technique motivated by 2-problems in understanding data in high dimensional spaces. t h i ti t d b 2 bl i d t di d t i hi h di i l Its objective is to project an ensemble of data points into 1, 2, or 3-dimensional spaces so that the spatial distance of these data points are preserved.
Thus, MDS is used for two purposes:
1). Visualize the structures and properties of data, so that we may select proper models for them.
Lecture note for Stat 231: Pattern Recognition and Machine Learning
2). Verify some distance (metric) measure on some unknown dataset. With a good distance measure, the data clusters should correspond to meaningful set, e.g., in image database retrieval, or art authentication.
Lecture note for Stat 231: Pattern Recognition and Machine Learning
One computes the (x,y) coordinates for the 10 cities that best preserve the distance matrix.
Lecture note for Stat 231: Pattern Recognition and Machine Learning
Another example is to map various colors in a 2D matrix so that some perceptual distances are preserved. I am sorry that we cannot print out color, but the pdf file will be in color. One can calculate a perceptual color distance by psychology experiments then obtainsOne can calculate a perceptual color distance by psychology experiments, then obtains a distance matrix, like the city matrix, then we can map colors in 2D
Lecture note for Stat 231: Pattern Recognition and Machine Learning
Lecture note for Stat 231: Pattern Recognition and Machine Learning
S. Lyu, D. Rockmore, and H. Farid, PNAS, 2004
Given: a set of data points in d-space {x 1 , x 2 , …, x (^) n } a dissimilarity / distance measure/metric between two points x (^) i, x (^) j: δij
Objective: find points in 1,2, or 3-space {y 1 , y 2 , …, y (^) n } with usually Eclidean distances dij for two points y (^) i and y (^) j.
A criterion (Kruskal 1964) is to minimize
ij
2
,
Lecture note for Stat 231: Pattern Recognition and Machine Learning
ij
ij ,
Lecture note for Stat 231: Pattern Recognition and Machine Learning
In some applications, the quantitative distance or dissimilarity is less important than the rank order. Thus an MDS mapping criterion will be a monotonic constraint that the project points preserve the rank order of the original data points.g p
Suppose we re-order the m=n(n-1)/2 distance in the original data
For any m numbers that preserve the monotonic constraints,
Lecture note for Stat 231: Pattern Recognition and Machine Learning
We define a criterion for the projected points as,
Let N(e) be the minimum e-cover of the dataset D, we define a Kolmogorov capacity dimension (or Box counting dimension) by
In other word, the number (volume) has an exponential rate
Or we have a linear relation in a log-log plot
Lecture note for Stat 161: Introduction to Pattern Recognition and Machine Learning
O e a e a ea e at o a og og p ot
The capacity dimension assumes a uniform probability for each ball. If this is not uniform, we have a modified version called the information dimension,
Where It is easy to check that
Lecture note for Stat 161: Introduction to Pattern Recognition and Machine Learning
Theorem:
Given N data points,
The correlation dimension is
Lecture note for Stat 161: Introduction to Pattern Recognition and Machine Learning
Intuitively, the higher dimension the manifold is, the more neighbors a point will have.