



Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
MACHINE LEARNING CHEATSHEET. Summary of Machine Learning Algorithms descriptions, advantages and use cases. Inspired by the very good book.
Typology: Study notes
1 / 5
This page cannot be seen from the preview
Don't miss anything!




Summary of Machine Learning Algorithms descriptions, advantages and use cases. Inspired by the very good book and articles of MachineLearningMastery, with added math , and ML Pros & Cons of HackingNote. Design inspired by The Probability Cheatsheet of W. Chen. Written by Rémi Canard.
Definition We want to learn a target function f that maps input variables X to output variable Y , with an error e : 𝑌 = 𝑓 𝑋 + 𝑒 Linear, Nonlinear Different algorithms make different assumptions about the shape and structure of f , thus the need of testing several methods. Any algorithm can be either:
Almost every machine learning method has an optimization algorithm at its core. Gradient Descent Gradient Descent is used to find the coefficients of f that minimizes a cost function (for example MSE, SSR). Procedure: à Initialization 𝜃 = 0 (coefficients to 0 or random) à Calculate cost 𝐽(𝜃) = 𝑒𝑣𝑎𝑙𝑢𝑎𝑡𝑒(𝑓 𝑐𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡𝑠 ) à Gradient of cost 7 789 𝐽(𝜃)^ we know the uphill direction à Update coeff 𝜃𝑗 = 𝜃𝑗 − 𝛼 7897 𝐽(𝜃) we go downhill The cost updating process is repeated until convergence (minimum found). Batch Gradient Descend does summing/averaging of the cost over all the observations. Stochastic Gradient Descent apply the procedure of parameter updating for each observation. Tips:
All linear Algorithms assume a linear relationship between the input variables X and the output variable Y. Linear Regression Representation: A LR model representation is a linear equation: 𝑦 = 𝛽@ + 𝛽D𝑥D + ⋯ + 𝛽?𝑥? 𝛽@ is usually called intercept or bias coefficient. The dimension of the hyperplane of the regression is its complexity.
Learning: Learning a LR means estimating the coefficients from the training data. Common methods include Gradient Descent or Ordinary Least Squares. Variations: There are extensions of LR training called regularization methods, that aim to reduce the complexity of the model:
Which actually models the probability of default class: 𝑝 𝑋 =
Learning: Learning the Logistic regression coefficients is done using maximum-likelihood estimation , to predict values close to 1 for default class and close to 0 for the other class. Data preparation:
D
E
E ?CD F LDA assumes Gaussian data and attributes of same 𝝈𝟐. Predictions are made using Bayes Theorem: 𝑃 𝑌 = 𝑘 𝑋 = 𝑥 =
] cCD 𝑃(𝑙)×𝑃(𝑥|𝑙) to obtain a discriminate function (latent variable) for each class k , estimating 𝑃(𝑥|𝑘) with a Gaussian distribution: 𝐷Z 𝑥 = 𝑥 ×
For regression the output can be the mean , while for classification the output can be the most common class. Various distances can be used, for example:
Learning: The hyperplane learning is done by transforming the problem using linear algebra, and minimizing: 1 𝑛 max 0 , 1 − 𝑦? 𝑤. 𝑥 − 𝑏 E ?CD
Variations: SVM is implemented using various kernels, which define the measure between new data and support vectors:
Advantages: In addition to the advantages of the CART algorithm
This stage value is used to update the instances weights : 𝑤 = 𝑤×𝑒rstv× The incorrectly predicted instance are given more weight. Weak models are added sequentially using the training weights, until no improvement can be made or the number of rounds has been attained. Data preparation:
https://machinelearningmastery.com/ Scikit-learn website, for python implementation http://scikit-learn.org/ W.Chen probability cheatsheet https://github.com/wzchen/probability_cheatsheet HackingNote , for interesting, condensed insights https://www.hackingnote.com/ Seattle Data Guy blog , for business oriented articles https://www.theseattledataguy.com/ Explained visually , making hard ideas intuitive http://setosa.io/ev/ This Machine Learning Cheatsheet https://github.com/remicnrd/ml_cheatsheet