Unsupervised Learning: K-means Clustering Assignment | Assignments Machine Learning

CSC2042S 2025 Assignment 1: Unsupervised

Learning

In this assignment, you will apply K-means clustering to analyze global development patterns

using the World Development Indicators (WDI) dataset. The assignment will focus on the

practical challenges of applying clustering to real-world data, model optimization strategies, and

interpretation of results. In addition to submitting the code for reproducing your analysis, you

need to submit a report that documents your model design decisions and analyses the results.

Learning Objectives

Upon completion of this assignment, you should be able to:

• Handle real-world data preprocessing challenges in clustering applications

• Evaluate and optimize clustering algorithms through various initialization and

convergence strategies

• Assess cluster quality using multiple validation approaches

• Apply dimensionality reduction techniques to improve clustering performance

• Critically analyze the trade-offs in clustering design decisions

Dataset

The World Development Indicators database from the World Bank contains over 1,400 time

series indicators for 220 countries and territories from 1960 to present. The dataset contains key

development indicators such as:

• Economic indicators (GDP per capita, inflation, trade balance)

• Social indicators (life expectancy, literacy rate, population growth)

• Environmental indicators (CO2 emissions, forest area, renewable energy consumption)

• Infrastructure indicators (internet users, mobile subscriptions, electricity access)

The dataset is available on the course Amathuba site under Content / Resources / Datasets. The

source of the dataset is: https://datacatalog.worldbank.org/dataset/world-development-indicators.

Your code should load the dataset from a specified folder in the given format: Do not submit the

dataset with your assignment or let the code download the dataset. In the main file in the dataset,

WDICSV.csv, each row represents an indicator for one country, with columns for the value of

that indicator at different years. However, the indicator values are not always available for every

year. In your data analysis, treat each country-year combination as one multi-dimensional data

point, with development indicators serving as features. This will enable clustering countries with

similar indicators and to track the development indicators over time (e.g. how similar is South

Africa in 2024 to South Korea in 1990). Additionally, the Human Development Index

scores/classifications of countries are also included – this should only be used as a reference

classification for analyzing the final clustering.

Unsupervised Learning: K-means Clustering Assignment, Assignments of Machine Learning

Related documents

Partial preview of the text

Download Unsupervised Learning: K-means Clustering Assignment and more Assignments Machine Learning in PDF only on Docsity!

CSC2042S 2025 Assignment 1: Unsupervised

Learning

Learning Objectives

Dataset

Assignment Tasks

1. Data Preprocessing ( 6 marks)

2. K-means Clustering and Initialization ( 6 marks)

6: Cluster Interpretation (3 marks)

7. Creative Extensions ( 8 marks)

Submission Requirements

Academic Integrity

Marking Rubric