Predicting Student Exam Scores using Machine Learning, Assignments of Machine Learning

A comprehensive tutorial on applying various machine learning techniques to predict student exam scores. It covers the implementation of linear regression with gradient descent, linear regression with least squares, and polynomial regression using python. The document guides the reader through data preprocessing, model initialization, cost function definition, gradient descent implementation, and model evaluation. It also includes insights and comparisons between the different regression methods. Likely to be useful for university students studying machine learning, data science, or applied statistics, as it provides hands-on experience with implementing and evaluating regression models on a real-world dataset.

Typology: Assignments

2023/2024

Available from 10/07/2024

b-naveen-kumar
b-naveen-kumar 🇮🇳

8 documents

1 / 15

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Applied Machine Learning Lab Sheet3- M3
(Predicting Student Exam Scores)
Module 3: Linear Regression with Gradient Descent, Linear
Regression with Least squares, Polynomial regression using
Python.
Task 1: Implementing Linear Regression with Gradient
Descent
Step 1: Preprocessing the Data
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
import matplotlib.pyplot as plt
# Load the dataset
url = 'https://drive.google.com/uc?id=1TwOizNpaHfITQK_kWbFVBTlHAw0e1bxW'
data = pd.read_csv(url)
# Display the first few rows of the dataset
print(data.head())
# Display dataset information
print(data.info())
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff

Partial preview of the text

Download Predicting Student Exam Scores using Machine Learning and more Assignments Machine Learning in PDF only on Docsity!

Applied Machine Learning – Lab Sheet3- M

(Predicting Student Exam Scores)

Module 3: Linear Regression with Gradient Descent, Linear

Regression with Least squares, Polynomial regression using

Python.

Task 1: Implementing Linear Regression with Gradient

Descent

Step 1: Preprocessing the Data

import pandas as pd import numpy as np from sklearn.model_selection import train_test_split from sklearn.preprocessing import StandardScaler, OneHotEncoder from sklearn.compose import ColumnTransformer from sklearn.pipeline import Pipeline from sklearn.impute import SimpleImputer import matplotlib.pyplot as plt

Load the dataset

url = 'https://drive.google.com/uc?id=1TwOizNpaHfITQK_kWbFVBTlHAw0e1bxW' data = pd.read_csv(url)

Display the first few rows of the dataset

print(data.head())

Display dataset information

print(data.info())

Check for missing values

print(data.isnull().sum())

Separate features and target

X = data.drop('Exam_Scores', axis=1) y = data['Exam_Scores']

Identify categorical and numeric columns

categorical_features = ['Parental_Education', 'Ethnicity'] numeric_features = ['Hours_Studied', 'Previous_Exams']

Preprocessing pipeline

numeric_transformer = Pipeline(steps=[ ('imputer', SimpleImputer(strategy='mean')), ('scaler', StandardScaler())]) categorical_transformer = Pipeline(steps=[ ('imputer', SimpleImputer(strategy='most_frequent')),

Step 3: Defining the Cost Function

def compute_cost(X, y, weights, bias): m = X.shape[0] predictions = X.dot(weights) + bias cost = (1 / (2 * m)) * np.sum((predictions - y) ** 2) return cost initial_cost = compute_cost(X_train, y_train, weights, bias) print("Initial cost:", initial_cost)

Step 4: Implementing Gradient Descent

def gradient_descent(X, y, weights, bias, learning_rate, iterations): m = X.shape[0] cost_history = [] for i in range(iterations): predictions = X.dot(weights) + bias error = predictions - y dW = (1 / m) * X.T.dot(error)

db = (1 / m) * np.sum(error) weights - = learning_rate * dW bias - = learning_rate * db cost = compute_cost(X, y, wei ghts, bias) cost_history.append(cost) if i % 100 == 0: print(f"Iteration {i}: Cost {cost}") return weights, bias, cost_history

Hyperparameters

learning_rate = 0. iterations = 1000

Training the model

weights, bias, cost_history = gradient_descent(X_train, y_train, weights, bias, learning_rate, iterations) print("Final weights:", weights) print("Final bias:", bias)

Task 2: Implementing Linear Regression with the Least

Squares Method

Adding a bias term (intercept) to the preprocessed training and validation sets

X_train_b = np.c_[np.ones((X_train.shape[0], 1)), X_train] X_val_b = np.c_[np.ones((X_val.shape[0], 1)), X_val]

Closed-form solution for least squares

weights_b = np.linalg.inv(X_train_b.T.dot(X_train_b)).dot(X_train_b.T).dot(y_train) print("Weights (including bias):", weights_b)

Predict on the validation set

y_pred_ls = X_val_b.dot(weights_b)

Calculate MSE

mse_ls = mean_squared_error(y_val, y_pred_ls) print("MSE (Least Squares):", mse_ls)

Task 3: Implementing Polynomial Regression

Ridge Regression to handle singular matrix issue

from sklearn.linear_model import Ridge from sklearn.preprocessing import PolynomialFeatures

Define the degree of the polynomial

degree = 2

Generate polynomial features

poly = PolynomialFeatures(degree) X_poly_train = poly.fit_transform(X_train) X_poly_val = poly.transform(X_val)

Apply Ridge Regression with a small alpha to reduce overfitting

ridge_reg = Ridge(alpha=0.01) ridge_reg.fit(X_poly_train, y_train)

Predict on the validation set

y_pred_poly_ridge = ridge_reg.predict(X_poly_val)

Calculate MSE

mse_poly_ridge = mean_squared_error(y_val, y_pred_poly_ridge) print(f"MSE (Polynomial Regression with Ridge, degree={degree}):", mse_poly_ridge)

Visualize the predictions vs actual scores

plt.scatter(y_val, y_pred_poly_ridge, c='orange', label='Predicted') plt.plot([y_val.min(), y_val.max()], [y_val.min(), y_val.max()], 'r--', lw=2, label='Actual')

Task 4: Evaluating and Comparing Performance

Using Ridge Regression

from sklearn.linear_model import Ridge from sklearn.preprocessing import PolynomialFeatures

Define the degree of the polynomial

degree = 2

Generate polynomial features

poly = PolynomialFeatures(degree) X_poly_train = poly.fit_transform(X_train) X_poly_val = poly.transform(X_val)

Apply Ridge Regression

ridge_reg = Ridge(alpha=0.01) # You can adjust the alpha (regularization strength) ridge_reg.fit(X_poly_train, y_train)

Predict on the validation set

y_pred_poly_ridge = ridge_reg.predict(X_poly_val)

Calculate MSE

mse_poly_ridge = mean_squared_error(y_val, y_pred_poly_ridge) print(f"MSE (Polynomial Regression with Ridge, degree={degree}):", mse_poly_ridge)

Update the MSE printing section

print("MSE Comparison:") print(f"Gradient Descent: {mse_gd}") print(f"Least Squares: {mse_ls}") print(f"Polynomial Regression (Ridge, degree={degree}): {mse_poly_ridge}")

Plotting predicted vs actual exam scores for each model

plt.figure(figsize=(15, 5))

Gradient Descent

plt.subplot(1, 3, 1) plt.scatter(y_val, y_pred, c='blue', label='Predicted') plt.plot([y_val.min(), y_val.max()], [y_val.min(), y_val.max()], 'r--', lw=2, label='Actual') plt.xlabel('Actual Exam Scores') plt.ylabel('Predicted Exam Scores') plt.title('Linear Regression (Gradient Descent)') plt.legend()

Least Squares

plt.subplot(1, 3, 2) plt.scatter(y_val, y_pred_ls, c='green', label='Predicted') plt.plot([y_val.min(), y_val.max()], [y_val.min(), y_val.max()], 'r--', lw=2, label='Actual') plt.xlabel('Actual Exam Scores') plt.ylabel('Predicted Exam Scores') plt.title('Linear Regression (Least Squares)') plt.legend()

Polynomial Regression with Ridge

plt.subplot(1, 3, 3) plt.scatter(y_val, y_pred_poly_ridge, c='orange', label='Predicted') plt.plot([y_val.min(), y_val.max()], [y_val.min(), y_val.max()], 'r--', lw=2, label='Actual') plt.xlabel('Actual Exam Scores') plt.ylabel('Predicted Exam Scores') plt.title(f'Polynomial Regression with Ridge (degree={degree})') plt.legend()

Task 5: Providing Insights

from sklearn.metrics import mean_squared_error from sklearn.linear_model import LinearRegression from sklearn.pipeline import Pipeline, make_pipeline

Define the LinearRegressionGD class

class LinearRegressionGD: def init(self, learning_rate=0.01, n_iterations=1000): self.learning_rate = learning_rate self.n_iterations = n_iterations def fit(self, X, y): self.m, self.n = X.shape self.theta = np.zeros(self.n) self.bias = 0 self.cost_history = [] for _ in range(self.n_iterations): y_pred = np.dot(X, self.theta) + self.bias cost = (1/(2self.m)) * np.sum((y_pred - y)*2) self.cost_history.append(cost) d_theta = (1/self.m) * np.dot(X.T, (y_pred - y)) d_bias = (1/self.m) * np.sum(y_pred - y) self.theta - = self.learning_rate * d_theta self.bias - = self.learning_rate * d_bias

def predict(self, X): return np.dot(X, self.theta) + self.bias

Initialize and train the gradient descent model

model_gd = LinearRegressionGD(learning_rate=0.01, n_iterations=1000) model_gd.fit(X_train, y_train)

Make predictions

y_pred_gd = model_gd.predict(X_val)

Evaluate the model

mse_gd = np.mean((y_pred_gd - y_val) ** 2) print("Mean Squared Error (Gradient Descent):", mse_gd)

Print coefficients and intercept for gradient descent

print("Coefficients (Gradient Descent):", model_gd.theta) print("Intercept (Gradient Descent):", model_gd.bias)

Initialize and train the least squares model

model_ls = LinearRegression() model_ls.fit(X_train, y_train)

Make predictions

y_pred_ls = model_ls.predict(X_val)

Evaluate the model

mse_ls = np.mean((y_pred_ls - y_val) ** 2) print("Mean Squared Error (Least Squares):", mse_ls)

Print coefficients and intercept for least squares