Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Machine learning ensemble methods Machine learning ensemble methods Machine learning ensem, Study notes of Numerical Methods in Engineering

Machine learning ensemble methods Machine learning ensemble methods Machine learning ensemble methods Machine learning ensemble methods Machine learning ensemble methods Machine learning ensemble methods Machine learning ensemble methods Machine learning ensemble methods Machine learning ensemble methods Machine learning ensemble methods Machine learning ensemble methods

Typology: Study notes

2019/2020

Uploaded on 02/24/2023

yasmin-jwabreh
yasmin-jwabreh 🇵🇸

5 documents

1 / 20

Toggle sidebar

Related documents


Partial preview of the text

Download Machine learning ensemble methods Machine learning ensemble methods Machine learning ensem and more Study notes Numerical Methods in Engineering in PDF only on Docsity! ENCS5341 Machine Learning and Data Science Model and Hyper-parameters Selection Yazan Abu Farha - Birzeit University Some of the slides were taken from Joaquin Vanschoren Predictive Learning: Function approximation Assumed: Q) aninstance space X (fixed distribution D,), target space Y, function f: X -> Y Qi aset of allowed hypotheses H (aka language of hypotheses L,,) Given (Input): QO) examples E c X x Y such that OQ) x drawn i.i.d according to Dy Q “Vv” e=(xy)eE: f(x)=y Find (Output): OQ) ahypothesis h e Hsuch that the true error of h error, (h) := Ep, ( error( f(x), h(x) ) ) i.e., the expected (average) classification error on instances drawn according to D, is minimal. N.B. Use suitable error defs for discrete and continuous prediction! N.B. A hypothesis is also often referred to as a model. Performance Estimation Techniques • Always evaluate models as they are predicting future data. • If the data is seen during training, we cannot use it for evaluation. • We do not have access to future data, so we pretend that some data is hidden. • Simplest way: the holdout set (simple train-test split) • Randomly split data into training and test set (e.g. 80% - 20%) • Train (fit) a model on the training data, score on the test data. 4 n-fold Cross Validation, leave-one-out fori=l..n Ujzi Sj — a 100/n % —| Dataset random of ° e e e e split ; ° Si <—————> 100/n % —| test of data ‘ Sn ——— average ! = typical n around 10 error estimate = extreme case "leave one out": n = size of dataset, i.e., testing is done on single elements n-fold Cross Validation (CV) for estimating a single algorithm 6 Hyperparameter Tuning • There exist a huge range of techniques to tune hyperparameters. The simplest: • Grid search: choose a range of values for every hyperparameter, try every combination. • Does not scale to many hyperparameters. • Random search: choose random values for all hyperparameters, iterate n times. • Better, especially when some hyperparameters are less important. • Many more advanced techniques exists. E.g. Bayesian optimization 9 Hyperparameter Tuning (Cont.) • First, split the data in training and test sets (outer split). • Split the training data again (inner cross-validation). • Generate hyperparameters configurations (e.g. random/grid search). • Evaluate all configurations on all inner splits, select the best one (on average). • Retrain best configurations on full training set, evaluate on held-out test data. 10 Example: Out-of-Sample Testing for Decision Tree Pruning Decision Tree Algorithm build n,% of data seo random|_-—~ n,% 0 . ie] ae | split I Full dedision tree d 100-n, % ranaqaom of data ——| Dataset . split prune — 100-n, % of Pruned |Decision Tree data v Unbiased error estimate test Vv e Use evaluation set to compare different pruned trees * Cannot use accuracy on evaluation set as unbiased true error estimate!! 11 Classification Metrics TP+TN accuracy = 7p + FN +TN + FP Actual TP TPR(recall or sensitivity) = TP AFN TN TNR (specificity) = TN 4 FP FPR = ON + FP FN FNR = 73 FN TP P vos = recltston TP + FP + FP Predicted Negative Positive Negative True Negative False Positive Positive False Negative True Positive 14 Classification Metrics Remarks • Accuracy is not the best measure when the classes are highly skewed, i.e., number of samples of one class is significantly higher than the number of samples of other class. • Precision is used when the goal is to limit FPs. E.g. clinical trials (you only want to test drugs that really work), search engine (you want to avoid bad search results). • Recall is used when the goal is to limit FNs. E.g. cancer diagnosis (you don’t want to miss a serious disease), search engine (you don’t want to miss important hits). 15 Precision vs. Recall relevant elements I true negatives selected elements How many selected items are relevant? Precision = How many relevant items are selected? Recall = 16