



Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
K-Means clustering is a popular unsupervised machine learning algorithm used for partitioning a dataset into groups or clusters based on similarity or distance between data points. The primary goal of K-Means is to separate data into K clusters, where K is a user-defined parameter. It's commonly used for data analysis, image segmentation, recommendation systems, and more. Here's how the K-Means algorithm works:
Typology: Lab Reports
1 / 5
This page cannot be seen from the preview
Don't miss anything!




In [37]: In [15]: Out[15]: Variable_1 Variable_ 0 12 30 1 20 36 2 28 30 3 18 52 4 29 54 5 33 46 6 24 55 7 45 59 8 45 63 9 52 70 10 51 66 11 52 63 12 55 58 13 53 23 14 55 14 15 61 8 16 64 19 17 69 7 18 72 24 # k means clustering import pandas as pd import numpy as np import matplotlib.pyplot as plt data = pd.read_excel("C:/Users/kriti/OneDrive/Desktop/machine Learning/experiments/C data
In [17]: fig = plt.figure(figsize = ( 5 , 5 )) x = data["Variable_1"] y = data["Variable_2"] n = range( 0 , 19 ) plt.grid() plt.scatter(x, y, marker = 'o', c = 'red' ) plt.xlabel('Variable_1') plt.ylabel('Variable_2') for i, txt in enumerate(n): plt.annotate(txt, (x[i], y[i])) from sklearn.cluster import KMeans individual_clustering_score = [] for i in range( 1 , 4 ): kmeans = KMeans(n_clusters = i) kmeans.fit(data) individual_clustering_score.append(kmeans.inertia_)
In [25]: In [ ]: Out[22]: array([[23.42857143, 43.28571429], [62.33333333, 15.83333333], [50. , 63.16666667]]) # cluster centers centroids fig = plt.figure(figsize = ( 5 , 5 )) # dictionary- map numbers to colors colmap = { 1 :'r', 2 :'b', 3 :'g'} # map will assign colors to labels colors = map( lambda x: colmap[x + 1 ], labels) colors1 = list(colors) plt.scatter(x, y, color = colors1, alpha = 0.5 ) # plotting the centroids wrt color for idx, centroid in enumerate(centroids): plt.scatter( ***** centroid, color = colmap[idx + 1 ]) # labeling the points as 0,1,2,.... for i, txt in enumerate(n): plt.annotate(txt, (x[i], y[i])) plt.grid()
In [ ]: In [ ]: In [ ]: In [ ]: In [ ]: In [ ]: In [ ]: