

Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
This python script uses scikit-learn library to perform k-nearest neighbors (knn) classification on a dataset stored in a csv file. The script splits the dataset into training and testing sets, performs 10-fold cross-validation to find the optimal number of neighbors, and plots the misclassification error against the number of neighbors. Finally, it trains the knn classifier with the optimal number of neighbors and evaluates its accuracy.
Typology: Thesis
1 / 3
This page cannot be seen from the preview
Don't miss anything!


import pandas as pd
names = ['ax','ay', 'az', 'gx', 'gy', 'gz', 'anglex', 'angley', 'anglez', 'class'] df = pd.read_csv('C:/Users/AMCS2/Desktop/ML-PAPER/A9-SFU-SFB-SFR-SFL-S-SOC-R-WS-F-MT5-8-9- 11000R.csv', header=None, names=names) df.head() import numpy as np from sklearn.model_selection import train_test_split
X = np.array(df.iloc[:, 1:9]) # end index is exclusive y = np.array(df['class']) # another way of indexing a pandas df
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=42) from sklearn.neighbors import KNeighborsClassifier from sklearn.metrics import accuracy_score
neighbors = list(range(1, 55, 2))
cv_scores = [] from sklearn.model_selection import cross_val_score
for k in neighbors: knn = KNeighborsClassifier(n_neighbors=k) scores = cross_val_score(knn, X_train, y_train, cv=10, scoring='accuracy') cv_scores.append(scores.mean())
mse = [1 - x for x in cv_scores]
optimal_k = neighbors[mse.index(min(mse))] print("The optimal number of neighbors is {}".format(optimal_k))
import matplotlib.pyplot as plt plt.dpi= plt.rcParams["figure.figsize"] = (10,5) plt.plot(neighbors, mse) plt.xlabel("Number of Neighbors K") plt.ylabel("Misclassification Error")
plt.grid(True) plt.show()