General Optimization Linear Algorithms | Study notes Machine Learning

MACHINE LEARNING CHEATSHEET

Summary' of' Machine' Learning' Algorithms' descriptions,'

advantages'and' use'cases.' Inspired'by'the'very' good'book'

and'articles'of'MachineLearningMastery,/with'added'math,/

and'ML/Pros/&/Cons'of'HackingNote.'Design'inspired'by'The/

Probability/Cheatsheet/of'W.'Chen.'Written'by'Rémi'Canard.'

General

Definition

We! want! to! learn! a! target! function!f!that! maps! input!

variables!X!to!output!variable!Y,!with!an!error!e:!

𝑌 = 𝑓 𝑋 + &𝑒!

Linear, Nonlinear

Different!algorithms!make!different!assumptions!about!the!

shape!and! structure! of!f,! thus!th e!need! of!testin g!several!

methods.!Any!algorithm!can!be!either:!

-!Parametric!(or! Linear):!simplify! the!mapping! to!a! known!

linear!combination!form!and!learning!its!coefficients.!

-!Non! parametric!(or! Nonlinear):! free! to! learn! any!

functional! form!from! the! training!data,! while! maintaining!

some!ability!to!generalize.!

Linear! algorithms! are!usually!simpler,! faster! and! requires!

less!data ,!while!Nonlinear! can!be! are!more! flexible,!more!

powerful!and!more!performant.!

Supervised, Unsupervised

Supervised! learning!methods! learn! to! predict! Y! from! X!

given!that!the!data!is!labeled.!

Unsupervised!learning!methods!learn!to!find!the! inherent!

structure!of!the!unlabeled!data.!

Bias-Variance trade-off

In!supervised! learning,!the! prediction!error! e"is!composed!

of!the!bias,!the!variance!and!the!irreducible!part.!

Bias!refers!to! simplifying! assumptions!made!to! learn!the!

target!function!easily.!

Variance!refers!to!sensitivity!of!the!model!to!changes!in!the!

training!data.!

The! goal!of! parameterization!is! to! achieve! a! low! bias!

(underlying! pattern! not! too! simplified)! and! low! variance!

(not!sensitive!to!specificities!of!the!training!data)!tradeoff.

Underfitting, Overfitting

In! statistics,! fit' refers! to! how! well! the! target! function! is!

approximated.!

Underfitting!refers!to!poor!inductive!learning!from!training!

data!and!poor!generalization.!

Overfitting!refers!to! learning! the! training! d ata! detail! and!

noise!which!leads! to!poor!generalization.!It!can! be!limited!

by!using!resampling!and!defining!a!validation!dataset.!

Optimization

Almost!every!machine!learning!method!has!an!optimization!

algorithm!at!its!core.!

Gradient Descent

Gradient!Descent! is!used!to! find!the! coefficients!of!f" that!

minimizes!a!cost!function!(for!example!MSE,!SSR).!!

Procedure:!

à!Initialization!!!!!!!!!!!𝜃 = 0!!!!!(coefficients!to!0!or!random)!

à!Calculate!cost!!!!!!!!!𝐽(𝜃) = 𝑒𝑣𝑎𝑙𝑢𝑎𝑡𝑒(𝑓 𝑐𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡𝑠 )!

à!Gradient!of!cost!!!! 7

789 𝐽(𝜃)!!we!know!the!uphill!direction!

à!Update!coeff!!!!!!!!!!𝜃𝑗 = &𝜃𝑗 − &𝛼 7

789 𝐽(𝜃)!we!go!downhill!

The! cost! updating! process! is! repeated! until! convergence!

(minimum!found).!

Batch! Gradient! Descend! does! summing/averaging! of!the!

cost!over!all!the!observations.!

Stochastic! Gradient! Descent! apply! the! procedure! of!

parameter!updating!for!each!observation.!

Tips:!

-!Change!learning!rate!𝛼!(“size!of!jump”!at!each!iteration)!

-!Plot!Cost"vs"Time"to!assess!learning!rate!performance!

-!Rescaling!the!input!variables!

-!Reduce!passes!through!training!set!with!SGD!

-!Average!over!10!or!more!updated!to!observe!the!learning!

trend!while!using!SGD!

Ordinary Least Squares

!OLS!is!used!to!find!the!estimator!𝛽!!that!minimizes!the!sum!

of!squared!residuals:!! (𝑦?− 𝛽@− 𝛽9𝑥?9

9CD

?CD )F= 𝑦 − 𝑋 &𝛽!

Using!linear!algebra!such!that!we!have!𝛽 = (𝑋G𝑋)HD 𝑋G𝑦!!

Maximum Likelihood Estimation

MLE! is! used! to! find! the! estimators! that! minimizes!the!

likelihood!function:!

ℒ 𝜃 𝑥 = 𝑓

8(𝑥)!!!!!!!density!function!of!the!data!distribution!

Linear Algorithms

All!linear!Algorithms!assume! a!linear!relationship!between!

the!input!variables!X!and!the!output!variable!Y.!

Linear Regression

Representation:!

A!LR!model!representation!is!a!linear!equation:!

𝑦 =& 𝛽@+ 𝛽D𝑥D+ ⋯ + 𝛽?𝑥?!

𝛽@!is! usually! called! intercept! or! bias!coefficient.!The!

dimension! of! the! hyperplane! of! the! regression! is! its!

complexity.!

General Optimization Linear Algorithms, Study notes of Machine Learning

Related documents

Partial preview of the text

Download General Optimization Linear Algorithms and more Study notes Machine Learning in PDF only on Docsity!

MACHINE LEARNING CHEATSHEET

General

Optimization

Linear Algorithms

𝑒QRSQTUTS⋯SQVUV

1 + 𝑒QRSQTUTS⋯SQVUV

𝑒QRSQTUTS⋯SQVUV

1 + 𝑒QRSQTUTS⋯SQVUV

𝜇Z =

E[^ 𝑥?

?CD and^ 𝜎

F = D

EH] (𝑥?^ −^ 𝜇Z)

𝑃(𝑘)×𝑃(𝑥|𝑘)

𝜇Z

𝜎F^

𝜇ZF

2 𝜎F^

+ 𝜆||𝑤||F