





















Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
This lecture explores adaptive basis function models in machine learning, focusing on decision trees and boosting algorithms. It delves into the concepts of overfitting and regularization, explaining techniques like early stopping, weight decay, and bayesian priors. The lecture also covers the bias-variance trade-off and its implications for model performance. It concludes with a brief introduction to deep neural networks as adaptive basis function models.
Typology: Lecture notes
1 / 29
This page cannot be seen from the preview
Don't miss anything!






















C O M P U T E R S C I E N C E G R A I N G E R E N G I N E E R I N G
Adaptive Basis Function Models
-^ In adaptive basis function models we explicitly learn this function from data Instead of using kernels
The basis function is parameterized and learned from data
2
Decision Trees
MLaPP Figure 1.1, 16.2 4
4,0 shape
color size < 10 4,0 0,
ellipse (^) ot her
blue (^) red ot her yes (^) no Number of training examples per class
Decision Trees
Where to threshold Based on the cost minimization
5
Sample index Features Threshold Set of possible threshold
Decision Trees
Misclassification rate
Entropy
7
Decision Trees
MLaPP Figure 16.4, 16.5(b) 8
Decision Trees
Can handle discrete input Robust to monotone transformation and scaling (e.g. log) Comes with feature selection Works well with large datasets Easy to handle missing variables
10
Overfitting and Regularization
Therefore,
PRML Figure 1.2 11
Overfitting and Regularization
N different train Each pair is used to train a classifier and to evaluate it-validation pairs Average the N results The average shows the performance of your choice MLaPP Figure 16.5(b) 13
1 st^ fold Features^ Train TrainFrames Test Train Test Train Test Train Train
2 nd^ fold 3 rd^ fold
Overfitting and Regularization
PRML Figure 1.6 14
Overfitting and Regularization
PRML Figure 1.8 16
ln 𝜆 = −∞ ln 𝜆 = − 18 ln 𝜆 = 0 𝑤 0 ∗^ 0.35 0.35 0. 𝑤 1 ∗^ 232.37 4.74 - 0. 𝑤 2 ∗^ - 5321.83 - 0.77 - 0. 𝑤 3 ∗^ 48568.31 - 31.97 - 0. 𝑤 4 ∗^ - 231639.30 - 3.89 - 0. 𝑤 5 ∗^ 640042.26 55.28 - 0. 𝑤 6 ∗^ - 1061800.52 41.32 - 0. 𝑤 7 ∗^ 1042400.18 - 45.95 - 0. 𝑤 8 ∗^ - 557682.99 - 91.53 0. 𝑤 9 ∗^ 125201.43 72.68 0.
Overfitting and Regularization
Prior MAP Or to minimize PRML Figure 1.16 17
The Bias-Variance Trade-Off
PRML Figure 3.5 19
Original function
Estimated functions
Average of the estimated functions
Too much regularization: low variance, high bias
Model averaging doesn’t help remove bias
Too little regularization: high variance, low bias
Model averaging helps remove variance
The Bias-Variance Trade-Off
20