

Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Artificial Intelligence. Lectures Notes of Machine Learning. Prof. Andrew Ng - Stanford University - Contents: The k-means clustering algorithm
Typology: Study notes
1 / 3
This page cannot be seen from the preview
Don't miss anything!


In the clustering problem, we are given a training set {x(1),... , x(m)}, and want to group the data into a few cohesive “clusters.” Here, x(i)^ ∈ Rn as usual; but no labels y(i)^ are given. So, this is an unsupervised learning problem. The k-means clustering algorithm is as follows:
For every i, set c(i)^ := arg min j ||x(i)^ − μj ||^2.
For each j, set
μj :=
∑m i=1 1 {c
(i) (^) = j}x(i) ∑m i=1 1 {c (i) (^) = j}.
}
In the algorithm above, k (a parameter of the algorithm) is the number of clusters we want to find; and the cluster centroids μj represent our current guesses for the positions of the centers of the clusters. To initialize the cluster centroids (in step 1 of the algorithm above), we could choose k training examples randomly, and set the cluster centroids to be equal to the values of these k examples. (Other initialization methods are also possible.) The inner-loop of the algorithm repeatedly carries out two steps: (i) “Assigning” each training example x(i)^ to the closest cluster centroid μj , and (ii) Moving each cluster centroid μj to the mean of the points assigned to it. Figure 1 shows an illustration of running k-means.
(a) (b) (c)
(d) (e) (f)
Figure 1: K-means algorithm. Training examples are shown as dots, and cluster centroids are shown as crosses. (a) Original dataset. (b) Random ini- tial cluster centroids (in this instance, not chosen to be equal to two training examples). (c-f) Illustration of running two iterations of k-means. In each iteration, we assign each training example to the closest cluster centroid (shown by “painting” the training examples the same color as the cluster centroid to which is assigned); then we move each cluster centroid to the mean of the points assigned to it. (Best viewed in color.) Images courtesy Michael Jordan.
Is the k-means algorithm guaranteed to converge? Yes it is, in a certain sense. In particular, let us define the distortion function to be:
J(c, μ) =
∑^ m
i=
||x(i)^ − μc(i) ||^2
Thus, J measures the sum of squared distances between each training exam- ple x(i)^ and the cluster centroid μc(i) to which it has been assigned. It can be shown that k-means is exactly coordinate descent on J. Specifically, the inner-loop of k-means repeatedly minimizes J with respect to c while holding μ fixed, and then minimizes J with respect to μ while holding c fixed. Thus, J must monotonically decrease, and the value of J must converge. (Usu- ally, this implies that c and μ will converge too. In theory, it is possible for