K-means Clustering Methods, Exercises of Data Mining

Use the k-means algorithm and calculate the distance for each centroids from a data point and the data point having minimum distance from the centroid of a cluster is assign to that particular cluster center and calculate the mean value for that cluster center.

Typology: Exercises

2024/2025

Uploaded on 03/07/2025

amy-aung
amy-aung 🇸🇬

2 documents

1 / 3

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1. (a). Use the k-means algorithm and calculate the distance for each centroids from a data
point and the data point having minimum distance from the centroid of a cluster is assign to
that particular cluster center and calculate the mean value for that cluster center. Show the
result for each iteration to cluster the following 8 samples into 3 clusters: A1(2,10),
A2(2,5), A3(8,5), B1(5, 8), B2(7,5), B3(6,4), C1(1,2), C2(4,9). Suppose that the initial
centroids (centers of each cluster) are A1(2,10), B1(5,8), C1(1,2). Run the k-means
algorithm for each iteration. At the end of this iteration show:
1. The new clusters and update the centroids of the new clusters;
2. How many more iterations are needed to converge?
Using K-means Algorithm:
Data set: A1(2,10), A2(2,5), A3(8,5), B1(5,8), B2(7,5), B3(6,4), C1(1,2), C2(4,9).
Centroids: A1(2,10), B1(5,8), C1(1,2).
Iteration: 1
We need to Calculate the distance for each centroids from a data point and the data point having minimum
distance from the centroid of a cluster is assign to that particular cluster center and calculate the mean value
for that cluster center.
Two points (x1,y1), (x2,y2)
Euclidean distance Formula: =√(x2-x1)2 + (y2-y1)2
or = |x2-x1| + |y2-y1|
Mean Formula: ((x1+x2) / 2, (y1+y2) / 2).
1ST ROW:
Distance calculate between the A2 data point and the Centroids A1, B1, C1
Distance between A2(2,5) & A1(2,10) = |2-2| + |5-10| = 0+5 = 5
Distance between A2(2,5) & B1(5,8) = |2-5| + |5-8| = 3+3 = 6
Distance between A2(2,5) & C1(1,2) = |2-1| + |5-2| = 1+3 = 4
The A2 nearby Cluster Center is C1.
Then we need to calculate the mean value between the C1 and A2.
C1 Mean value = (1.5,3.5)
Then we need to update the centroid C1 value as (1.5, 3.5).
2ND ROW:
Distance calculate between the A3 data point and the Centroids A1, B1, C1
Distance between A3(8,5) & A1(2,10) = 11
Distance between A3(8,5) & B1(5,8) = 6
Distance between A3(8,5) & C1(1.5,3.5) = 8
The A3 nearby Cluster Center is B1.
Then we need to calculate the mean value between the A3 and B1.
B1 Mean value = (6.5, 6.5)
Then we need to update the centroid B1 value as (6.5, 6.5).
3RD ROW:
Distance calculate between the B2 data point and the Centroids A1, B1, C1
Distance between B2(7,5) & A1(2,10) = 10
Distance between B2(7,5) & B1(6.5,6.5) = 2
Distance between B2(7,5) & C1(1.5, 3.5) = 7
Page 1 of 3
pf3

Partial preview of the text

Download K-means Clustering Methods and more Exercises Data Mining in PDF only on Docsity!

1. (a). Use the k-means algorithm and calculate the distance for each centroids from a data

point and the data point having minimum distance from the centroid of a cluster is assign to

that particular cluster center and calculate the mean value for that cluster center. Show the

result for each iteration to cluster the following 8 samples into 3 clusters: A1(2,10),

A2(2,5), A3(8,5), B1(5, 8), B2(7,5), B3(6,4), C1(1,2), C2(4,9). Suppose that the initial

centroids (centers of each cluster) are A1(2,10), B1(5,8), C1(1,2). Run the k-means

algorithm for each iteration. At the end of this iteration show:

1. The new clusters and update the centroids of the new clusters;

2. How many more iterations are needed to converge?

Using K-means Algorithm: Data set: A1(2,10), A2(2,5), A3(8,5), B1(5,8), B2(7,5), B3(6,4), C1(1,2), C2(4,9). Centroids: A1(2,10), B1(5,8), C1(1,2). Iteration: 1 We need to Calculate the distance for each centroids from a data point and the data point having minimum distance from the centroid of a cluster is assign to that particular cluster center and calculate the mean value for that cluster center. Two points (x1,y1), (x2,y2) Euclidean distance Formula: =√(x2-x1)^2 + (y2-y1)^2 or = |x2-x1| + |y2-y1| Mean Formula: ((x1+x2) / 2, (y1+y2) / 2). 1 ST^ ROW: Distance calculate between the A2 data point and the Centroids A1, B1, C Distance between A2(2,5) & A1(2,10) = |2-2| + |5-10| = 0+5 = 5 Distance between A2(2,5) & B1(5,8) = |2-5| + |5-8| = 3+3 = 6 Distance between A2(2,5) & C1(1,2) = |2-1| + |5-2| = 1+3 = 4 The A2 nearby Cluster Center is C1. Then we need to calculate the mean value between the C1 and A2. C1 Mean value = (1.5,3.5) Then we need to update the centroid C1 value as (1.5, 3.5). 2 ND^ ROW: Distance calculate between the A3 data point and the Centroids A1, B1, C Distance between A3(8,5) & A1(2,10) = 11 Distance between A3(8,5) & B1(5,8) = 6 Distance between A3(8,5) & C1(1.5,3.5) = 8 The A3 nearby Cluster Center is B1. Then we need to calculate the mean value between the A3 and B1. B1 Mean value = (6.5, 6.5) Then we need to update the centroid B1 value as (6.5, 6.5). 3 RD^ ROW: Distance calculate between the B2 data point and the Centroids A1, B1, C Distance between B2(7,5) & A1(2,10) = 10 Distance between B2(7,5) & B1(6.5,6.5) = 2 Distance between B2(7,5) & C1(1.5, 3.5) = 7

The B2 nearby Cluster Center is B1. Then we need to calculate the mean value between the B2 and B1. B1 Mean value = (6.7, 5.7) Then we need to update the centroid B1 value as (6.7, 5.7). 4 TH^ ROW: Distance calculate between the B3 data point and the Centroids A1, B1, C Distance between B3(6,4) & A1(2,10) = 10 Distance between B3(6,4) & B1(6.7, 5.7) = 2. Distance between B3(6,4) & C1(1.5,3.5) = 5 The B3 nearby Cluster Center is B1. Then we need to calculate the mean value between the B3 and B1. B1 Mean value = (6.3, 4.8) Then we need to update the centroid B1 value as (6.3, 4.8). 5 TH^ ROW: Distance calculate between the C2 data point and the Centroids A1, B1, C1 Distance between C2(4,9) & A1(2, 10) = 3 Distance between C2(4,9) & B1(6.3, 4.8) = 6. Distance between C2(4,9) & C1(1.5, 3.5) = 8 The C2 nearby Cluster Center is A1. Then we need to calculate the mean value between the C2 and A1. A1 Mean value = (3, 9.5) Then we need to update the centroid A1 value as (3, 9.5). The updated Cluster points are : A1(3, 9.5), B1(6.3, 4.8), C1(1.5, 3.5) The above calculations are shown in the form of below table: Now we need to go for the next iteration with the updated cluster points