Introduction to Machine Learning: Algorithms, Techniques, and Applications, Lecture notes of Machine Learning

An introduction to machine learning, covering various algorithms and concepts. It explains the difference between machine learning and traditional programming, and discusses supervised and unsupervised learning techniques. The document also covers decision trees, bagging, boosting, knn, and dbscan algorithms. Additionally, it addresses issues like overfitting, underfitting, and data bias in machine learning models. The document concludes with applications of machine learning in finance, healthcare, and marketing, along with an overview of training, validation, and test datasets. It also covers bias and variance, and performance analysis using confusion matrix, precision, recall, and f score. (447 characters)

Typology: Lecture notes

2024/2025

Uploaded on 08/24/2025

roshan-manjal
roshan-manjal 🇮🇳

1 document

1 / 47

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Module 1
Introduction to ML
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f

Partial preview of the text

Download Introduction to Machine Learning: Algorithms, Techniques, and Applications and more Lecture notes Machine Learning in PDF only on Docsity!

Module 1

Introduction to ML

What is Machine Learning?

  • (^) Computer algorithm that learn without been explicitly coded by a

programmer.

  • (^) The machine receives data as input and uses an algorithm to

formulate answers.

  • (^) Use of statistical methods, algorithms are trained to make

classifications or predictions, and to uncover key insights in data

How does ML works

  • (^) Human learns from experience
  • (^) By analogy, when we face an unknown situation, the likelihood of success is lower than the known situation. Machines are trained the same.
  • (^) ML core concept is
    • (^) Learning : discovery patterns in data
    • (^) Feature vector :The list of attributes used to solve a problem is called a feature vector. You can think of a feature vector as a subset of data that is used to tackle a problem.
    • (^) Model : machine uses algorithm and transfer the discovery into model
    • (^) Inferring: You can use the model previously trained to make inference on new data.

Types of ML

  • (^) Supervised Learning : in presence of a supervisor or teacher
  • (^) Data is having label (Outcome or data annotation)
  • (^) Supervised learning is classified into two categories of algorithms:
    • (^) Classifications : A classification problem is when the output variable is a category.
    • (^) Regression : A regression problem is when the output variable is a real value.
  • (^) Key Points : Supervised learning deals with or learns with “labeled”

data. This implies that some data is already tagged with the correct

answer.

Decision Tree

  • (^) Decision tree algorithms are nothing but a series of if-else statements that can be used to predict a result based on a dataset. This flowchart-like structure helps us in decision making.
  • (^) What is Entropy : is the measures of impurity, disorder or uncertainty in a bunch of examples. Entropy controls how a Decision Tree decides to split the data
  • (^) Red and Green balls in sample of 14 (4 Red and 10 Green
  • (^) The entropy of a group in which all examples belong to the same class will always be 0
  • (^) he entropy of a group with 50% in either class will always be 1

Decision Tree

  • (^) Information gain (IG) measures how much “information” a feature gives us about the class. It tells us how important a given attribute of the feature vectors is. Information gain (IG) is used to decide the ordering of attributes in the nodes of a decision tree.
  • (^) Information Gain ID3 : it tends to use the feature that has more unique values.
  • (^) Gain Ratio C 4.5 : Tend to prefer unbalanced spilt ( one partition much smaller than other)
  • (^) Gini Index (CART ): Normalized by Spilt Info ( Used in Binary setup)

XGBoost – Extreme Gradient Boosting

  • (^) Boosting: Builds model from individual weak learner in iterative way
  • (^) Unlike random forest not build on random subset of data/features
  • (^) But more weights on instances with wrong predictions  learn from mistakes
  • (^) Gradient boosting uses GD to minimize loss function
  • (^) XGBoost :
  • (^) Developed by university of Washington (2016)
  • (^) Credited with winning kaggle competitions
  • (^) Uses many tricks to optimize accuracy and speed

Decision Tree ----- XG Boost

  • (^) Decision Tree: Flow chart of decisions based on certain condition
  • (^) Bagging : Combine prediction from multiple decision tree via majority voting ( democracy)
  • (^) Random forest : similar to bagging but only a subset of features are selected at random to build a collection of decision tree( address overfitting)
  • (^) Boosting : builds model sequentially by minimizing errors from previous models and boosting influence if high performing models
  • (^) Gradient Boosting: uses Gradient descent to minimize errors
  • (^) XGBoost: Optimized Gradient boosting (Parallelization, regularization and pruning etc.)

Unsupervised Learning

  • (^) In unsupervised learning, an algorithm explores input data without

being given an explicit output variable (e.g., explores customer

demographic data to identify patterns)

  • (^) You want algorithm to find pattern and classify data
  • (^) Algorithm
    • (^) K-means
    • (^) GMM

K-Means

  • (^) Initialize ‘K’ i.e number of clusters to be created.
  • (^) Randomly assign K centroid points.
  • (^) Assign each data point to its nearest centroid to create K clusters.
  • (^) Re-calculate the centroids using the newly created clusters.
  • (^) Repeat steps 3 and 4 until the centroid gets fixed.
  • (^) WCSS: the sum of the square distance between points in a cluster and

the cluster centroid.

DBSCAN

  • (^) Density-Based Spatial Clustering of Applications with Noise (DBSCAN) is a

base algorithm for density-based clustering. It can discover clusters of

different shapes and sizes from a large amount of data, which is containing

noise and outliers.

  • (^) The DBSCAN algorithm uses two parameters:
  • (^) minPts: The minimum number of points (a threshold) clustered together

for a region to be considered dense.

  • (^) eps (ε): A distance measure that will be used to locate the points in the

neighborhood of any point.

DBSCAN

  • (^) There are three types of points after the DBSCAN clustering is complete:
  • (^) Core — This is a point that has at least m points within distance n from itself.
  • (^) Border — This is a point that has at least one Core point at a distance n.
  • (^) Noise — This is a point that is neither a Core nor a Border. And it has less than m points within distance n from itself.

Supervised vs Unsupervised

Supervised learning Unsupervised learnig Input Data is provided to the model along with the output in the Supervised Learning. Only input data is provided in Unsupervised Learning. Output is predicted by the Supervised Learning. Hidden patterns in the data can be found using the unsupervised learning model. Accurate results are produced using a supervised learning model. The accuracy of results produced are less in unsupervised learning models. Training the model to predict output when a new data is provided is the objective of Supervised Learning. Finding useful insights, hidden patterns from the unknown dataset is the objective of the unsupervised learning. Computational Complexity is very complex There is less computational complexity Some of the applications of Supervised Learning are Spam detection, handwriting detection, pattern recognition, speech recognition etc. Some of the applications of Unsupervised Learning are detecting fraudulent transactions, data preprocessing etc.

Steps in ML APPLICATION

  • (^) Collect Data
  • (^) Prepare Input
  • (^) Analyse the input
  • (^) Train the algorithm
  • (^) Valid the algorithm
  • (^) Test
  • (^) Use it