Core Concepts of Unsupervised ML, Study notes of Technology

TOPIC I COVERED :- • Foundations of Unsupervised Machine Learning • Characteristics of Unlabeled Data and Learning Without Targets • Core Tasks in Unsupervised ML (Clustering & Anomaly Detection) • K-Means Clustering: Concept, Objective Function, and Workflow • Elbow Method for Optimal Cluster Selection (WCSS Analysis) • K-Means Implementation and Visualization • Hierarchical Clustering: Theory, Types, and Dendrogram Analysis • Agglomerative Clustering Algorithm: Step-by-Step Process • Linkage Criteria (Single, Complete, Average, Ward’s) • Comparative Analysis: K-Means vs. Hierarchical Clustering • DBSCAN: Definition, Density-Based Logic, and Core Concepts • DBSCAN Point Categories (Core, Border, Noise) • DBSCAN Step-Wise Working and Reachability Concepts • DBSCAN Implementation with Example Plot • Dimensionality Reduction: Purpose and Approaches • Principal Component Analysis (PCA): Intuition and Key Concepts • PCA Mathematical Workflow (Standardization, Covariance Matrix)

Typology: Study notes

2025/2026

Available from 03/28/2026

varun-saxena-1
varun-saxena-1 🇮🇳

6 documents

1 / 29

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d

Partial preview of the text

Download Core Concepts of Unsupervised ML and more Study notes Technology in PDF only on Docsity!

CU GUE CEU EEE EEE TUE ETEEEEETE Unsupervised = Machine Learning = 29 ca S 26 ©. ey . o,to0 (2) » 900 o \ gz a", {o} ? o2e er o°% 9 ° 9% We cic es eteniebl vised | (Machine Learnina. | | learning, the model learns patterns and | ea p Structures from data without labeled outcomes or ‘Supervision. Input (%),x2,%3) |- i —| No labeled output | * Dataset contains only input features (X)- No labeled output (Y) 4 Main tasks of Unsupervised ML:— (+ Clustering:- Grouping similar deta points ine clusters | (e.9., IK-Means): | Clusters = o) (0,0 } = treo Gs.) * K-Means, Hierarchical, DBSCAN... | # Uncovers hidden patterns in data * Identifies subgroups within data A Tsolation Forest, DBSCAN' # Main tasks of Unsupervised ML: «Anomaly Detection: g ° Reducing the number of features . Identifying unusual data | while preserving important information points or anomalies iF A> | => oe ie = 20 | oo ee. | 1 a Principal Component Analysis (PCA) ‘* Detects outliers that | * Visualizes & Simplifies complex data sets devicte from the nomal | patterns | a K-Means Clustering * An thexative, Unsupe algorithm, © Goal: Partitions a dataset into kK pre- destined, non-overlapping Subgroups (cLustexs]. ed machine learning NY) * Mechanism? Each data point is asatgned to the Cluste with the neoest mean (Centrofd), aiming to minimize intra- cluster variance. +t Objective Function (optimization goal) :- * The goal is to minimize the “Within-Cluster Sum of E Squares (WCSS’), also known as Inertia. 2 We want points within a Cluster to be as ; Compact (Close to the centroid) as possible. | HF 2 | k 3 oo = Silas ° | Jal Kes Wwhexe * Je Objective function (Total cost) k = Number of clustexs My = The centroid (mean) of clustet | HH Io Aag I= Squared Euclidean Distance Lai tial Oo ~ Randomly select kK data points as ane initial. Centroids- 3 2: Assignment (Expectation): Calculate the Euclidean | | distance behween each data point and all centroids. | Assign the point to the closest centroid. irr ization) - Recalevlate the centroids éy Finding the mean of all points currently 3 assigned to that cluster. Mae 1Cyl KeCT ence, Repeat steps 2 & 3 ontil: e Centroids no longer move: * Points stop Changing clustests. e Mox iterations reached. = + Choosing. *k? CHyperpar wrameter) *- * The problem: k-means cannot determine the beforehand. numbex of clusters on its own, you must define ‘k’ HIERARCHICAL CLUSTERING e What I+ Is: An Unsupervised learning algorithm +hat groups data into a hierarchy (or tree) of clusters. e Approach: Unlike K-means, it does not requive pre- defining the numbex of Clusters Ck). ; sttom-Up):- Starts with n + clusters ee for es cotnt) and merges them iteratively. ).- Stevtts with one giant clustesc and recursively splits Tt. + Concept: A tree-like diagram that records the sequences of mergers or Splits. ° Usage: The y-axis usually represents the Euclidean distance between clusters. 2: You can draw a horizontal line TING across RE dendrogram at a Specific distance to determine the number of clusters. CECE EEEEEEEEE Sate ceientinetl Season eee ee cl Algorithm steps (Agalomerative ):- @) * Step 1: “dridfalization’ treat each data point as a Single cluster. (9f you have N points, you have N clusters). ¢ Step 2: Calculate Distance compute the distance (proximity matrix) between every pair of clusters. Step 3: Merge find the two closest clusters and merge them tntoa Single cluster. Step at Update Recalculate the distances behween the new cluster and alll remain clusters. Step 5: Repeat Steps 3 and 4 untill only a Single cluster (containing ald data points) remain. S: PuosS & Cons: + Pxod* * no need to specify K fn advance. . Dendstoguam provides a great visual Summosy of data relationship. e @ns* e Computationally Expensive — © (N3) time complexity , very slow ° Sample data x= £04, 23; 12, [4,4], [s,4], [3,74 © Perform hterarchical cl ustering = Z= linkage (X, method ="single”) e Plot the dendrogram plt. Figure (figstze = (6,4)) dendrogram (2) pit. title (“Hierarchical Clustering Dendrograr ') pit-xlabel (Data Points) plt. Ylabel CDistance”) plt.shou) ) Hierarchical Clustering Dendrogram 6 7 | 4 | 8 524 | — i eee | 1 2 3 A G6 ee Data Points TR RRR eee eee eee eee ee K-means vs Hierarchical Clustering ® Comparison Table | Efficiency Sensitivity to Outliers Order Dependency Result Representation Result Representation represent the cluster centers. ° Efficient for large datasets. «Less sensitive to outliers. «Order of data points does not affect the final clusters. + Cluster assignment for each data point. miCunter assignment for each data point. j especially for larger datasets. i Order of data BS [ae g z K-means Hierarchical Clustering Initialization {+ Randomly initialize |* Treat each data point as K centroids. a single cluster. Number of e Fixed, specified @ Not fixed, can produce a Clusters in advance (K). dendrogram showing all possible cluster solutions. Execution Time. }« Generally faster. |e Computationally expensive especially for larger datasets. Interpretability *Centroids ma . Dendrogram provides a visual representation of cluster hierarchy : . Computationally expensive * too sensitive to outliers «More sensitive to outliers, especially with Single Linkage. ints can affect the final clusters. . Dendrogram showing ae hierarchy of clusters, can cut at different levels. CORP RPO RRP Ree eRe eee eee The Mathematical Logic: Let D be the dataset and p be a point in D. 1. The e-Neighborhood (N,(p)): The set of all points within distance e€ from point p. [ x ={qeD|dist(p,q) . x re) 2 4 6 8 10 7 10 Feature 1 eee @ PCA does exactly this for data. It rotates the data to find Principal Component Analysis (PCA) _ the "best angle" (axis) where the data is most spread out. e Spread (Variance) = Information. e |f data is squashed together, we can't tell points apart (low info.). e If data is spread out, we can distinguish points clearly (high info.). 3. Key Concepts & Definitions A. Principal Components (PCs) These are the new axes (features) created by PCA. © PC1 (First Principal Component): y The line through the data that captures the most variance a (spread). PC2 (Second Principal Component): The line perpendicular (orthogonal) to PC1 that captures the second most variance. B. Orthogonality The new axes (PCs) are always perpendicular to each other. This means they are uncorrelated—knowing PC1 tells you nothing about PC2. Tee eRe eee eee ee eee @ 4. The Math & Working: Step-by-Step Here is the mathematical recipe for PCA. Step 1: Standardization PCA is sensitive to scale. If one feature is measured in km and another in cm, the km feature will dominate. We must scale data so the mean is O and variance is 1. Step 2: Covariance Matrix Computation | We need to understand how the variables relate to one another. The covariance matrix (2) is a square matrix showing the covariance between every pair of elements. For a matrix X (standardized data): Cov(X) = x'x e Positive covariance: Variables increase together. e Zero covariance: Variables are independent.