Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Core Concepts of Unsupervised ML, Study notes of Technology

Technology

TOPIC I COVERED :- • Foundations of Unsupervised Machine Learning • Characteristics of Unlabeled Data and Learning Without Targets • Core Tasks in Unsupervised ML (Clustering & Anomaly Detection) • K-Means Clustering: Concept, Objective Function, and Workflow • Elbow Method for Optimal Cluster Selection (WCSS Analysis) • K-Means Implementation and Visualization • Hierarchical Clustering: Theory, Types, and Dendrogram Analysis • Agglomerative Clustering Algorithm: Step-by-Step Process • Linkage Criteria (Single, Complete, Average, Ward’s) • Comparative Analysis: K-Means vs. Hierarchical Clustering • DBSCAN: Definition, Density-Based Logic, and Core Concepts • DBSCAN Point Categories (Core, Border, Noise) • DBSCAN Step-Wise Working and Reachability Concepts • DBSCAN Implementation with Example Plot • Dimensionality Reduction: Purpose and Approaches • Principal Component Analysis (PCA): Intuition and Key Concepts • PCA Mathematical Workflow (Standardization, Covariance Matrix)

Typology: Study notes

2025/2026

Available from 03/28/2026

varun-saxena-1 🇮🇳

6 documents

1 / 29

This page cannot be seen from the preview

Don't miss anything!

Partial preview of the text

Download Core Concepts of Unsupervised ML and more Study notes Technology in PDF only on Docsity!

CU GUE CEU EEE EEE TUE ETEEEEETE Unsupervised = Machine Learning = 29 ca S 26 ©. ey . o,to0 (2) » 900 o \ gz a", {o} ? o2e er o°% 9 ° 9% We cic es eteniebl vised | (Machine Learnina. | | learning, the model learns patterns and | ea p Structures from data without labeled outcomes or ‘Supervision. Input (%),x2,%3) |- i —| No labeled output | * Dataset contains only input features (X)- No labeled output (Y) 4 Main tasks of Unsupervised ML:— (+ Clustering:- Grouping similar deta points ine clusters | (e.9., IK-Means): | Clusters = o) (0,0 } = treo Gs.) * K-Means, Hierarchical, DBSCAN... | # Uncovers hidden patterns in data * Identifies subgroups within data A Tsolation Forest, DBSCAN' # Main tasks of Unsupervised ML: «Anomaly Detection: g ° Reducing the number of features . Identifying unusual data | while preserving important information points or anomalies iF A> | => oe ie = 20 | oo ee. | 1 a Principal Component Analysis (PCA) ‘* Detects outliers that | * Visualizes & Simplifies complex data sets devicte from the nomal | patterns | a K-Means Clustering * An thexative, Unsupe algorithm, © Goal: Partitions a dataset into kK pre- destined, non-overlapping Subgroups (cLustexs]. ed machine learning NY) * Mechanism? Each data point is asatgned to the Cluste with the neoest mean (Centrofd), aiming to minimize intra- cluster variance. +t Objective Function (optimization goal) :- * The goal is to minimize the “Within-Cluster Sum of E Squares (WCSS’), also known as Inertia. 2 We want points within a Cluster to be as ; Compact (Close to the centroid) as possible. | HF 2 | k 3 oo = Silas ° | Jal Kes Wwhexe * Je Objective function (Total cost) k = Number of clustexs My = The centroid (mean) of clustet | HH Io Aag I= Squared Euclidean Distance Lai tial Oo ~ Randomly select kK data points as ane initial. Centroids- 3 2: Assignment (Expectation): Calculate the Euclidean | | distance behween each data point and all centroids. | Assign the point to the closest centroid. irr ization) - Recalevlate the centroids éy Finding the mean of all points currently 3 assigned to that cluster. Mae 1Cyl KeCT ence, Repeat steps 2 & 3 ontil: e Centroids no longer move: * Points stop Changing clustests. e Mox iterations reached. = + Choosing. *k? CHyperpar wrameter) *- * The problem: k-means cannot determine the beforehand. numbex of clusters on its own, you must define ‘k’ HIERARCHICAL CLUSTERING e What I+ Is: An Unsupervised learning algorithm +hat groups data into a hierarchy (or tree) of clusters. e Approach: Unlike K-means, it does not requive pre- defining the numbex of Clusters Ck). ; sttom-Up):- Starts with n + clusters ee for es cotnt) and merges them iteratively. ).- Stevtts with one giant clustesc and recursively splits Tt. + Concept: A tree-like diagram that records the sequences of mergers or Splits. ° Usage: The y-axis usually represents the Euclidean distance between clusters. 2: You can draw a horizontal line TING across RE dendrogram at a Specific distance to determine the number of clusters. CECE EEEEEEEEE Sate ceientinetl Season eee ee cl Algorithm steps (Agalomerative ):- @) * Step 1: “dridfalization’ treat each data point as a Single cluster. (9f you have N points, you have N clusters). ¢ Step 2: Calculate Distance compute the distance (proximity matrix) between every pair of clusters. Step 3: Merge find the two closest clusters and merge them tntoa Single cluster. Step at Update Recalculate the distances behween the new cluster and alll remain clusters. Step 5: Repeat Steps 3 and 4 untill only a Single cluster (containing ald data points) remain. S: PuosS & Cons: + Pxod* * no need to specify K fn advance. . Dendstoguam provides a great visual Summosy of data relationship. e @ns* e Computationally Expensive — © (N3) time complexity , very slow ° Sample data x= £04, 23; 12, [4,4], [s,4], [3,74 © Perform hterarchical cl ustering = Z= linkage (X, method ="single”) e Plot the dendrogram plt. Figure (figstze = (6,4)) dendrogram (2) pit. title (“Hierarchical Clustering Dendrograr ') pit-xlabel (Data Points) plt. Ylabel CDistance”) plt.shou) ) Hierarchical Clustering Dendrogram 6 7 | 4 | 8 524 | — i eee | 1 2 3 A G6 ee Data Points TR RRR eee eee eee eee ee K-means vs Hierarchical Clustering ® Comparison Table | Efficiency Sensitivity to Outliers Order Dependency Result Representation Result Representation represent the cluster centers. ° Efficient for large datasets. «Less sensitive to outliers. «Order of data points does not affect the final clusters. + Cluster assignment for each data point. miCunter assignment for each data point. j especially for larger datasets. i Order of data BS [ae g z K-means Hierarchical Clustering Initialization {+ Randomly initialize |* Treat each data point as K centroids. a single cluster. Number of e Fixed, specified @ Not fixed, can produce a Clusters in advance (K). dendrogram showing all possible cluster solutions. Execution Time. }« Generally faster. |e Computationally expensive especially for larger datasets. Interpretability *Centroids ma . Dendrogram provides a visual representation of cluster hierarchy : . Computationally expensive * too sensitive to outliers «More sensitive to outliers, especially with Single Linkage. ints can affect the final clusters. . Dendrogram showing ae hierarchy of clusters, can cut at different levels. CORP RPO RRP Ree eRe eee eee The Mathematical Logic: Let D be the dataset and p be a point in D. 1. The e-Neighborhood (N,(p)): The set of all points within distance e€ from point p. [ x ={qeD|dist(p,q) . x re) 2 4 6 8 10 7 10 Feature 1 eee @ PCA does exactly this for data. It rotates the data to find Principal Component Analysis (PCA) _ the "best angle" (axis) where the data is most spread out. e Spread (Variance) = Information. e |f data is squashed together, we can't tell points apart (low info.). e If data is spread out, we can distinguish points clearly (high info.). 3. Key Concepts & Definitions A. Principal Components (PCs) These are the new axes (features) created by PCA. © PC1 (First Principal Component): y The line through the data that captures the most variance a (spread). PC2 (Second Principal Component): The line perpendicular (orthogonal) to PC1 that captures the second most variance. B. Orthogonality The new axes (PCs) are always perpendicular to each other. This means they are uncorrelated—knowing PC1 tells you nothing about PC2. Tee eRe eee eee ee eee @ 4. The Math & Working: Step-by-Step Here is the mathematical recipe for PCA. Step 1: Standardization PCA is sensitive to scale. If one feature is measured in km and another in cm, the km feature will dominate. We must scale data so the mean is O and variance is 1. Step 2: Covariance Matrix Computation | We need to understand how the variables relate to one another. The covariance matrix (2) is a square matrix showing the covariance between every pair of elements. For a matrix X (standardized data): Cov(X) = x'x e Positive covariance: Variables increase together. e Zero covariance: Variables are independent.

Core Concepts of Unsupervised ML, Study notes of Technology

Related documents

Partial preview of the text

Download Core Concepts of Unsupervised ML and more Study notes Technology in PDF only on Docsity!