Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Adaptive Basis Function Models in Machine Learning for Signals, Lecture notes of Computer Science

University of Illinois - Urbana-Champaign Computer Science

This lecture explores adaptive basis function models in machine learning, focusing on decision trees and boosting algorithms. It delves into the concepts of overfitting and regularization, explaining techniques like early stopping, weight decay, and bayesian priors. The lecture also covers the bias-variance trade-off and its implications for model performance. It concludes with a brief introduction to deep neural networks as adaptive basis function models.

Typology: Lecture notes

2023/2024

Uploaded on 12/11/2024

ritvik-avancha 🇺🇸

1 document

1 / 29

This page cannot be seen from the preview

Don't miss anything!

CS 545 Machine Learning for Signals

Lecture 14: Adaptive Basis Function

Models

Minje Kim, Ph.D.

Associate Professor

Siebel School of Computer Science

https:/ /minjekim.com

minje@illinois.edu

C OM PU T ER S CI EN C E G RA IN G ER E NG I NE ER I NG

Discover Lecture notes of Computer Science University of Illinois - Urbana-Champaign

Partial preview of the text

Download Adaptive Basis Function Models in Machine Learning for Signals and more Lecture notes Computer Science in PDF only on Docsity!

CS 545 Machine Learning for Signals

Lecture 14: Adaptive Basis Function

Models

M inje K im , P h.D.

S i eb e l S c ho o l o f C o m pu te r S ci e n c e^ A s s oc i a te Pr of es s o r

h tt ps : / /m i nj e k i m. c om m i nj e @i l l i n oi s. ed u

C O M P U T E R S C I E N C E G R A I N G E R E N G I N E E R I N G

Adaptive Basis Function Models

Really, nothing but weighted sum of features

○ Let’s go against the argument about the Kernel methods

○ Basis functions Feature transformation function

-^ In adaptive basis function models we explicitly learn this function from data Instead of using kernels

○ Adaptive basis functions? First you assume that there are M such basis functions

The basis function is parameterized and learned from data

The entire parameter set

Decision Trees

CART for classification

○ Fraction of the positive examples can be used to construct the posterior

MLaPP Figure 1.1, 16.2 4

4,0 shape

color size < 10 4,0 0,

ellipse (^) ot her

blue (^) red ot her yes (^) no Number of training examples per class

Decision Trees

Training

○ We need to decide Which feature to threshold

Where to threshold Based on the cost minimization

○ Optimization

Sample index Features Threshold Set of possible threshold

Decision Trees

Classification cost

○ Classification cost

Misclassification rate

Entropy

Decision Trees

Iris example

○ Overfitting?

MLaPP Figure 16.4, 16.5(b) 8

Decision Trees

Pros and cons of decision trees

○ Pros Easy to interpret (e.g. for medical diagnosis)

Can handle discrete input Robust to monotone transformation and scaling (e.g. log) Comes with feature selection Works well with large datasets Easy to handle missing variables

○ Cons There are other outperforming models

(^) UnstableThe greedy construction algorithm is not very optimal—small changes in the top node propagate down to the leaf nodes High variance

Overfitting and Regularization

Polynomial curve fitting

○ Minimizing an error function

Therefore,

○ Question: what’s the right order of polynomial, The more the better? M?

PRML Figure 1.2 11

Overfitting and Regularization

Preventing overfitting—early stopping

○ Early stopping Check on the simulated test error and stops earlier than the convergence (of the training error)

○ N- Divide the training set into N exclusive subsetsfold Cross validation: simulate the testing environment using training data

N different train Each pair is used to train a classifier and to evaluate it-validation pairs Average the N results The average shows the performance of your choice MLaPP Figure 16.5(b) 13

1 st^ fold Features^ Train TrainFrames Test Train Test Train Test Train Train

2 nd^ fold 3 rd^ fold

Overfitting and Regularization

Preventing overfitting—more training samples

○ Weights become larger if the model overfits

PRML Figure 1.6 14

M =0 M =1 M =3 M =

𝑤 0 ∗^ 0.19 0.82 0.31 0.

𝑤 1 ∗^ - 1.27 7.99 232.

𝑤 2 ∗^ - 25.43 - 5321.

𝑤 3 ∗^ 17.37 48568.

𝑤 4 ∗^ - 231639.

𝑤 5 ∗^ 640042.

𝑤 6 ∗^ - 1061800.

𝑤 7 ∗^ 1042400.

𝑤 8 ∗^ - 557682.

𝑤 9 ∗^ 125201.

○ A big training dataset solves the problem

Overfitting and Regularization

Preventing overfitting—weight decay

○ Regularization can decay the weights

PRML Figure 1.8 16

ln 𝜆 = −∞ ln 𝜆 = − 18 ln 𝜆 = 0 𝑤 0 ∗^ 0.35 0.35 0. 𝑤 1 ∗^ 232.37 4.74 - 0. 𝑤 2 ∗^ - 5321.83 - 0.77 - 0. 𝑤 3 ∗^ 48568.31 - 31.97 - 0. 𝑤 4 ∗^ - 231639.30 - 3.89 - 0. 𝑤 5 ∗^ 640042.26 55.28 - 0. 𝑤 6 ∗^ - 1061800.52 41.32 - 0. 𝑤 7 ∗^ 1042400.18 - 45.95 - 0. 𝑤 8 ∗^ - 557682.99 - 91.53 0. 𝑤 9 ∗^ 125201.43 72.68 0.

○ Regularization lets us use a complex model without worrying about overfitting

Overfitting and Regularization

Preventing overfitting—another takes

○ Recall SVM objective function can be seen as a combination of the hinge loss and regularization

○ Bayesian priors can work as a Maximum likelihood regularizer

Prior MAP Or to minimize PRML Figure 1.16 17

The Bias-Variance Trade-Off

Averaged model predictions can reduce variance

○ 100 models from 100 different

PRML Figure 3.5 19

Original function

Estimated functions

Average of the estimated functions

Too much regularization: low variance, high bias

Model averaging doesn’t help remove bias

Too little regularization: high variance, low bias

Model averaging helps remove variance

The Bias-Variance Trade-Off

Bootstrap aggregation (or bagging); random forests

Adaptive Basis Function Models in Machine Learning for Signals, Lecture notes of Computer Science

Related documents

Partial preview of the text

Download Adaptive Basis Function Models in Machine Learning for Signals and more Lecture notes Computer Science in PDF only on Docsity!

CS 545 Machine Learning for Signals

Lecture 14: Adaptive Basis Function

Models

M inje K im , P h.D.

S i eb e l S c ho o l o f C o m pu te r S ci e n c e^ A s s oc i a te Pr of es s o r

h tt ps : / /m i nj e k i m. c om m i nj e @i l l i n oi s. ed u

○ Let’s go against the argument about the Kernel methods

○ Basis functions Feature transformation function

○ Adaptive basis functions? First you assume that there are M such basis functions

○ Fraction of the positive examples can be used to construct the posterior

○ We need to decide Which feature to threshold

○ Optimization

○ Classification cost

○ Overfitting?

○ Pros Easy to interpret (e.g. for medical diagnosis)

○ Cons There are other outperforming models

○ Minimizing an error function

○ Question: what’s the right order of polynomial, The more the better? M?

○ Early stopping Check on the simulated test error and stops earlier than the convergence (of the training error)

○ N- Divide the training set into N exclusive subsetsfold Cross validation: simulate the testing environment using training data

○ Weights become larger if the model overfits

M =0 M =1 M =3 M =

𝑤 0 ∗^ 0.19 0.82 0.31 0.

𝑤 1 ∗^ - 1.27 7.99 232.

𝑤 2 ∗^ - 25.43 - 5321.

𝑤 3 ∗^ 17.37 48568.

𝑤 4 ∗^ - 231639.

𝑤 5 ∗^ 640042.

𝑤 6 ∗^ - 1061800.

𝑤 7 ∗^ 1042400.

𝑤 8 ∗^ - 557682.

𝑤 9 ∗^ 125201.

○ A big training dataset solves the problem

○ Regularization can decay the weights

○ Regularization lets us use a complex model without worrying about overfitting

○ Recall SVM objective function can be seen as a combination of the hinge loss and regularization

○ Bayesian priors can work as a Maximum likelihood regularizer

○ 100 models from 100 different

○ In theory, if you have multiple training datasets, Train multiple complex models and average the results→ low variance and low bias

○ In practice, you don’t have multiple training datasets

○ Bootstrapping Subsample from one training dataset with replacement

○ Train m - th model from m - th bootstrap dataset

○ The M models construct a committee ( bagging )

○ Variance We hope

○ Random forests : subsample from dataset; subset of variables