Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Predicting Student Exam Scores using Machine Learning, Assignments of Machine Learning

Birla Institute of Technology and Science Machine Learning

A comprehensive tutorial on applying various machine learning techniques to predict student exam scores. It covers the implementation of linear regression with gradient descent, linear regression with least squares, and polynomial regression using python. The document guides the reader through data preprocessing, model initialization, cost function definition, gradient descent implementation, and model evaluation. It also includes insights and comparisons between the different regression methods. Likely to be useful for university students studying machine learning, data science, or applied statistics, as it provides hands-on experience with implementing and evaluating regression models on a real-world dataset.

Typology: Assignments

2023/2024

Available from 10/07/2024

b-naveen-kumar 🇮🇳

8 documents

1 / 15

This page cannot be seen from the preview

Don't miss anything!

Applied Machine Learning – Lab Sheet3- M3

(Predicting Student Exam Scores)

Module 3: Linear Regression with Gradient Descent, Linear

Regression with Least squares, Polynomial regression using

Python.

Task 1: Implementing Linear Regression with Gradient

Descent

Step 1: Preprocessing the Data

import pandas as pd

import numpy as np

from sklearn.model_selection import train_test_split

from sklearn.preprocessing import StandardScaler, OneHotEncoder

from sklearn.compose import ColumnTransformer

from sklearn.pipeline import Pipeline

from sklearn.impute import SimpleImputer

import matplotlib.pyplot as plt

# Load the dataset

url = 'https://drive.google.com/uc?id=1TwOizNpaHfITQK_kWbFVBTlHAw0e1bxW'

data = pd.read_csv(url)

# Display the first few rows of the dataset

print(data.head())

# Display dataset information

print(data.info())

Discover Assignments of Machine Learning Birla Institute of Technology and Science

Partial preview of the text

Download Predicting Student Exam Scores using Machine Learning and more Assignments Machine Learning in PDF only on Docsity!

Applied Machine Learning – Lab Sheet3- M

(Predicting Student Exam Scores)

Module 3: Linear Regression with Gradient Descent, Linear

Regression with Least squares, Polynomial regression using

Python.

Task 1: Implementing Linear Regression with Gradient

Descent

Step 1: Preprocessing the Data

import pandas as pd import numpy as np from sklearn.model_selection import train_test_split from sklearn.preprocessing import StandardScaler, OneHotEncoder from sklearn.compose import ColumnTransformer from sklearn.pipeline import Pipeline from sklearn.impute import SimpleImputer import matplotlib.pyplot as plt

Load the dataset

url = 'https://drive.google.com/uc?id=1TwOizNpaHfITQK_kWbFVBTlHAw0e1bxW' data = pd.read_csv(url)

Display the first few rows of the dataset

print(data.head())

Display dataset information

print(data.info())

Check for missing values

print(data.isnull().sum())

Separate features and target

X = data.drop('Exam_Scores', axis=1) y = data['Exam_Scores']

Identify categorical and numeric columns

categorical_features = ['Parental_Education', 'Ethnicity'] numeric_features = ['Hours_Studied', 'Previous_Exams']

Preprocessing pipeline

numeric_transformer = Pipeline(steps=[ ('imputer', SimpleImputer(strategy='mean')), ('scaler', StandardScaler())]) categorical_transformer = Pipeline(steps=[ ('imputer', SimpleImputer(strategy='most_frequent')),

Step 3: Defining the Cost Function

def compute_cost(X, y, weights, bias): m = X.shape[0] predictions = X.dot(weights) + bias cost = (1 / (2 * m)) * np.sum((predictions - y) ** 2) return cost initial_cost = compute_cost(X_train, y_train, weights, bias) print("Initial cost:", initial_cost)

Step 4: Implementing Gradient Descent

def gradient_descent(X, y, weights, bias, learning_rate, iterations): m = X.shape[0] cost_history = [] for i in range(iterations): predictions = X.dot(weights) + bias error = predictions - y dW = (1 / m) * X.T.dot(error)

db = (1 / m) * np.sum(error) weights - = learning_rate * dW bias - = learning_rate * db cost = compute_cost(X, y, wei ghts, bias) cost_history.append(cost) if i % 100 == 0: print(f"Iteration {i}: Cost {cost}") return weights, bias, cost_history

Hyperparameters

learning_rate = 0. iterations = 1000

Training the model

weights, bias, cost_history = gradient_descent(X_train, y_train, weights, bias, learning_rate, iterations) print("Final weights:", weights) print("Final bias:", bias)

Task 2: Implementing Linear Regression with the Least

Squares Method

Adding a bias term (intercept) to the preprocessed training and validation sets

X_train_b = np.c_[np.ones((X_train.shape[0], 1)), X_train] X_val_b = np.c_[np.ones((X_val.shape[0], 1)), X_val]

Closed-form solution for least squares

weights_b = np.linalg.inv(X_train_b.T.dot(X_train_b)).dot(X_train_b.T).dot(y_train) print("Weights (including bias):", weights_b)

Predict on the validation set

y_pred_ls = X_val_b.dot(weights_b)

Calculate MSE

mse_ls = mean_squared_error(y_val, y_pred_ls) print("MSE (Least Squares):", mse_ls)

Task 3: Implementing Polynomial Regression

Ridge Regression to handle singular matrix issue

from sklearn.linear_model import Ridge from sklearn.preprocessing import PolynomialFeatures

Define the degree of the polynomial

degree = 2

Generate polynomial features

poly = PolynomialFeatures(degree) X_poly_train = poly.fit_transform(X_train) X_poly_val = poly.transform(X_val)

Apply Ridge Regression with a small alpha to reduce overfitting

ridge_reg = Ridge(alpha=0.01) ridge_reg.fit(X_poly_train, y_train)

Predict on the validation set

y_pred_poly_ridge = ridge_reg.predict(X_poly_val)

Calculate MSE

mse_poly_ridge = mean_squared_error(y_val, y_pred_poly_ridge) print(f"MSE (Polynomial Regression with Ridge, degree={degree}):", mse_poly_ridge)

Visualize the predictions vs actual scores

plt.scatter(y_val, y_pred_poly_ridge, c='orange', label='Predicted') plt.plot([y_val.min(), y_val.max()], [y_val.min(), y_val.max()], 'r--', lw=2, label='Actual')

Task 4: Evaluating and Comparing Performance

Using Ridge Regression

from sklearn.linear_model import Ridge from sklearn.preprocessing import PolynomialFeatures

Define the degree of the polynomial

degree = 2

Generate polynomial features

poly = PolynomialFeatures(degree) X_poly_train = poly.fit_transform(X_train) X_poly_val = poly.transform(X_val)

Apply Ridge Regression

ridge_reg = Ridge(alpha=0.01) # You can adjust the alpha (regularization strength) ridge_reg.fit(X_poly_train, y_train)

Predict on the validation set

y_pred_poly_ridge = ridge_reg.predict(X_poly_val)

Calculate MSE

mse_poly_ridge = mean_squared_error(y_val, y_pred_poly_ridge) print(f"MSE (Polynomial Regression with Ridge, degree={degree}):", mse_poly_ridge)

Update the MSE printing section

print("MSE Comparison:") print(f"Gradient Descent: {mse_gd}") print(f"Least Squares: {mse_ls}") print(f"Polynomial Regression (Ridge, degree={degree}): {mse_poly_ridge}")

Plotting predicted vs actual exam scores for each model

plt.figure(figsize=(15, 5))

Gradient Descent

plt.subplot(1, 3, 1) plt.scatter(y_val, y_pred, c='blue', label='Predicted') plt.plot([y_val.min(), y_val.max()], [y_val.min(), y_val.max()], 'r--', lw=2, label='Actual') plt.xlabel('Actual Exam Scores') plt.ylabel('Predicted Exam Scores') plt.title('Linear Regression (Gradient Descent)') plt.legend()

Least Squares

plt.subplot(1, 3, 2) plt.scatter(y_val, y_pred_ls, c='green', label='Predicted') plt.plot([y_val.min(), y_val.max()], [y_val.min(), y_val.max()], 'r--', lw=2, label='Actual') plt.xlabel('Actual Exam Scores') plt.ylabel('Predicted Exam Scores') plt.title('Linear Regression (Least Squares)') plt.legend()

Polynomial Regression with Ridge

plt.subplot(1, 3, 3) plt.scatter(y_val, y_pred_poly_ridge, c='orange', label='Predicted') plt.plot([y_val.min(), y_val.max()], [y_val.min(), y_val.max()], 'r--', lw=2, label='Actual') plt.xlabel('Actual Exam Scores') plt.ylabel('Predicted Exam Scores') plt.title(f'Polynomial Regression with Ridge (degree={degree})') plt.legend()

Task 5: Providing Insights

from sklearn.metrics import mean_squared_error from sklearn.linear_model import LinearRegression from sklearn.pipeline import Pipeline, make_pipeline

Define the LinearRegressionGD class

class LinearRegressionGD: def init(self, learning_rate=0.01, n_iterations=1000): self.learning_rate = learning_rate self.n_iterations = n_iterations def fit(self, X, y): self.m, self.n = X.shape self.theta = np.zeros(self.n) self.bias = 0 self.cost_history = [] for _ in range(self.n_iterations): y_pred = np.dot(X, self.theta) + self.bias cost = (1/(2self.m)) * np.sum((y_pred - y)*2) self.cost_history.append(cost) d_theta = (1/self.m) * np.dot(X.T, (y_pred - y)) d_bias = (1/self.m) * np.sum(y_pred - y) self.theta - = self.learning_rate * d_theta self.bias - = self.learning_rate * d_bias

def predict(self, X): return np.dot(X, self.theta) + self.bias

Initialize and train the gradient descent model

model_gd = LinearRegressionGD(learning_rate=0.01, n_iterations=1000) model_gd.fit(X_train, y_train)

Make predictions

y_pred_gd = model_gd.predict(X_val)

Evaluate the model

mse_gd = np.mean((y_pred_gd - y_val) ** 2) print("Mean Squared Error (Gradient Descent):", mse_gd)

Print coefficients and intercept for gradient descent

print("Coefficients (Gradient Descent):", model_gd.theta) print("Intercept (Gradient Descent):", model_gd.bias)

Initialize and train the least squares model

model_ls = LinearRegression() model_ls.fit(X_train, y_train)

Make predictions

y_pred_ls = model_ls.predict(X_val)

Evaluate the model

mse_ls = np.mean((y_pred_ls - y_val) ** 2) print("Mean Squared Error (Least Squares):", mse_ls)

Predicting Student Exam Scores using Machine Learning, Assignments of Machine Learning

Related documents

Partial preview of the text

Download Predicting Student Exam Scores using Machine Learning and more Assignments Machine Learning in PDF only on Docsity!

Applied Machine Learning – Lab Sheet3- M

(Predicting Student Exam Scores)

Module 3: Linear Regression with Gradient Descent, Linear

Regression with Least squares, Polynomial regression using

Python.

Task 1: Implementing Linear Regression with Gradient

Descent

Step 1: Preprocessing the Data

Load the dataset

Display the first few rows of the dataset

Display dataset information

Check for missing values

Separate features and target

Identify categorical and numeric columns

Preprocessing pipeline

Step 3: Defining the Cost Function

Step 4: Implementing Gradient Descent

Hyperparameters

Training the model

Task 2: Implementing Linear Regression with the Least

Squares Method

Adding a bias term (intercept) to the preprocessed training and validation sets

Closed-form solution for least squares

Predict on the validation set

Calculate MSE

Task 3: Implementing Polynomial Regression

Ridge Regression to handle singular matrix issue

Define the degree of the polynomial

Generate polynomial features

Apply Ridge Regression with a small alpha to reduce overfitting

Predict on the validation set

Calculate MSE

Visualize the predictions vs actual scores

Task 4: Evaluating and Comparing Performance

Using Ridge Regression

Define the degree of the polynomial

Generate polynomial features

Apply Ridge Regression

Predict on the validation set

Calculate MSE

Update the MSE printing section

Plotting predicted vs actual exam scores for each model

Gradient Descent

Least Squares

Polynomial Regression with Ridge

Task 5: Providing Insights

Define the LinearRegressionGD class

Initialize and train the gradient descent model

Make predictions

Evaluate the model

Print coefficients and intercept for gradient descent

Initialize and train the least squares model

Make predictions

Evaluate the model

Print coefficients and intercept for least squares