Model and Hyper-parameters Selection in Machine Learning and Data Science, Study notes of Numerical Methods in Engineering

Machine learning ensemble methods Machine learning ensemble methods Machine learning ensemble methods Machine learning ensemble methods Machine learning ensemble methods Machine learning ensemble methods Machine learning ensemble methods Machine learning ensemble methods Machine learning ensemble methods Machine learning ensemble methods Machine learning ensemble methods

Typology: Study notes

2019/2020

Uploaded on 02/24/2023

yasmin-jwabreh
yasmin-jwabreh 🇵🇸

5 documents

1 / 20

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
ENCS5341
Machine Learning and Data Science
Model and Hyper-parameters Selection
Yaz an Ab u F arh a -Birzeit University
Some of the slides were taken from Joaquin Van scho ren
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14

Partial preview of the text

Download Model and Hyper-parameters Selection in Machine Learning and Data Science and more Study notes Numerical Methods in Engineering in PDF only on Docsity!

ENCS

Machine Learning and Data Science

Model and Hyper-parameters Selection

Yazan Abu Farha - Birzeit University Some of the slides were taken from Joaquin Vanschoren

Predictive Learning: Function approximation

Designing Machine Learning Systems

  • Just running your favorite algorithm is not usually a great way to start.
  • Consider the problem: How to measure success? Are there costs involved?
  • Analyze the model’s mistakes. Don’t just finetune endlessly.
    • Build early prototypes. Should you collect more, or additional data?
    • Should the task be reformulated?
  • Overly complex machine learning systems are hard to maintain.

Performance Estimation Techniques

  • Always evaluate models as they are predicting future data.
    • If the data is seen during training, we cannot use it for evaluation.
  • We do not have access to future data, so we pretend that some data is hidden.
  • Simplest way: the holdout set (simple train-test split)
    • Randomly split data into training and test set (e.g. 80% - 20%)
    • Train (fit) a model on the training data, score on the test data.

n-fold Cross Validation (CV) for estimating a single algorithm

Choosing a performance estimation procedure

No strict rules, only guidelines.

  • Use holdout set for very large datasets (e.g. > 100,000 example).
  • Use leave-one-out cross-validation for vary small datasets (e.g. 100 examples).
  • Use cross-validation otherwise (usually 10 folds).

Hyperparameter Tuning

  • There exist a huge range of techniques to tune hyperparameters. The simplest:
    • Grid search: choose a range of values for every hyperparameter, try every combination.
      • Does not scale to many hyperparameters.
    • Random search: choose random values for all hyperparameters, iterate n times.
      • Better, especially when some hyperparameters are less important.
  • Many more advanced techniques exists. E.g. Bayesian optimization

Hyperparameter Tuning (Cont.)

  • First, split the data in training and test sets (outer split).
  • Split the training data again (inner cross-validation).
    • Generate hyperparameters configurations (e.g. random/grid search).
    • Evaluate all configurations on all inner splits, select the best one (on average).
  • Retrain best configurations on full training set, evaluate on held-out test data.

Evaluation Metrics

Regression metrics

  • Most commonly used:
    • Mean squared error.
      • Root mean squared error (RMSE) often used as well.
    • Mean absolute error.
      • Less sensitive to outliers.
  • R squared
  • Between 0 and 1, but negative if the model is worse than just predicting the mean.
  • Easier to interpret (higher is better).

Evaluation Metrics

Classification metrics

  • For classification, we usually represent the predictions by a confusion matrix, from which we derive all metrics.
  • Confusion Matrix
    • C by C matrix (C is the number of classes).
    • Rows correspond to true classes, columns to predicted classes.
    • Count how often samples belonging to class c are classified as c or any other class.
    • For binary classification, we label these true negative (TN), true positive (TP), false negative (FN), false positive (FP).

Classification Metrics

Remarks

  • Accuracy is not the best measure when the classes are highly skewed, i.e., number of samples of one class is significantly higher than the number of samples of other class.
  • Precision is used when the goal is to limit FPs. E.g. clinical trials (you only want to test drugs that really work), search engine (you want to avoid bad search results).
  • Recall is used when the goal is to limit FNs. E.g. cancer diagnosis (you don’t want to miss a serious disease), search engine (you don’t want to miss important hits).

Precision vs. Recall

Receiver operating characteristics (ROC)

  • Trade off true positive rate with false positive rate
  • Plotting TPR against FPR for all possible thresholds yields a Receiver Operating Characteristics curve. - Change the thresholds until you find a sweet spot in the TPR-FPR trade-off. - Lower thresholds yield higher TPR (recall), higher FPR, and vice versa.
  • The area under the ROC curve gives the best overall model. (^18)

Bias-variance trade off

  • Evaluate the same algorithm multiple times on different random sample of the data.
  • Two types of errors can be observed:
    • Bias error: systematic error independent of the training sample.
    • Variance error: error due to variability of the model w.r.t. the training sample