Download Machine learning ensemble methods Machine learning ensemble methods Machine learning ensem and more Study notes Numerical Methods in Engineering in PDF only on Docsity! ENCS5341 Machine Learning and Data Science Model and Hyper-parameters Selection Yazan Abu Farha - Birzeit University Some of the slides were taken from Joaquin Vanschoren Predictive Learning: Function approximation
Assumed:
Q) aninstance space X (fixed distribution D,), target space Y, function f: X -> Y
Qi aset of allowed hypotheses H (aka language of hypotheses L,,)
Given (Input):
QO) examples E c X x Y such that
OQ) x drawn i.i.d according to Dy
Q “Vv” e=(xy)eE: f(x)=y
Find (Output):
OQ) ahypothesis h e Hsuch that the true error of h
error, (h) := Ep, ( error( f(x), h(x) ) )
i.e., the expected (average) classification error on instances drawn according to D, is minimal.
N.B. Use suitable error defs for discrete and continuous prediction!
N.B. A hypothesis is also often referred to as a model.
Performance Estimation Techniques • Always evaluate models as they are predicting future data. • If the data is seen during training, we cannot use it for evaluation. • We do not have access to future data, so we pretend that some data is hidden. • Simplest way: the holdout set (simple train-test split) • Randomly split data into training and test set (e.g. 80% - 20%) • Train (fit) a model on the training data, score on the test data. 4 n-fold Cross Validation, leave-one-out
fori=l..n
Ujzi Sj
— a
100/n %
—| Dataset random of ° e e e e
split ;
° Si
<—————> 100/n % —| test
of data
‘ Sn
——— average
!
= typical n around 10 error estimate
= extreme case "leave one out": n = size of dataset, i.e., testing is done on single
elements
n-fold Cross Validation (CV) for estimating a single algorithm 6 Hyperparameter Tuning • There exist a huge range of techniques to tune hyperparameters. The simplest: • Grid search: choose a range of values for every hyperparameter, try every combination. • Does not scale to many hyperparameters. • Random search: choose random values for all hyperparameters, iterate n times. • Better, especially when some hyperparameters are less important. • Many more advanced techniques exists. E.g. Bayesian optimization 9 Hyperparameter Tuning (Cont.) • First, split the data in training and test sets (outer split). • Split the training data again (inner cross-validation). • Generate hyperparameters configurations (e.g. random/grid search). • Evaluate all configurations on all inner splits, select the best one (on average). • Retrain best configurations on full training set, evaluate on held-out test data. 10 Example: Out-of-Sample Testing for Decision Tree Pruning
Decision Tree Algorithm
build
n,% of data
seo random|_-—~
n,% 0 . ie]
ae | split I Full dedision tree
d 100-n, %
ranaqaom of data
——| Dataset .
split prune
—
100-n, % of Pruned |Decision Tree
data
v
Unbiased
error estimate
test
Vv
e Use evaluation set to compare different pruned trees
* Cannot use accuracy on evaluation set as unbiased true
error estimate!!
11
Classification Metrics
TP+TN
accuracy = 7p + FN +TN + FP
Actual
TP
TPR(recall or sensitivity) = TP AFN
TN
TNR (specificity) = TN 4 FP
FPR = ON + FP
FN
FNR = 73 FN
TP
P vos =
recltston TP + FP + FP
Predicted
Negative Positive
Negative True Negative False Positive
Positive False Negative True Positive
14
Classification Metrics Remarks • Accuracy is not the best measure when the classes are highly skewed, i.e., number of samples of one class is significantly higher than the number of samples of other class. • Precision is used when the goal is to limit FPs. E.g. clinical trials (you only want to test drugs that really work), search engine (you want to avoid bad search results). • Recall is used when the goal is to limit FNs. E.g. cancer diagnosis (you don’t want to miss a serious disease), search engine (you don’t want to miss important hits). 15 Precision vs. Recall
relevant elements
I
true negatives
selected elements
How many selected
items are relevant?
Precision =
How many relevant
items are selected?
Recall =
16