




























































































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
The bootstrap method and bagging estimates in machine learning. The bootstrap method is used to estimate the generalization error by generating many training sets from the original data and learning a model on each set. Bagging estimates combine the predictions of multiple models trained on different bootstrap samples to improve the performance. The document also covers the adaboost algorithm, which is a popular boosting method for generating training sets and combining classifiers.
Typology: Slides
1 / 133
This page cannot be seen from the preview
Don't miss anything!





























































































-^ We are discussing methods to estimate the test error(or true risk) of a model essentially using the trainingdata.
PR NPTEL course – p.1/
-^ We are discussing methods to estimate the test error(or true risk) of a model essentially using the trainingdata. •^ We discussed cross-validation which is a very popularmethod for this.
PR NPTEL course – p.2/
-^ The final cross-validation estimate is the average oferrors of each learnt model on data points not usedfor training that model.
PR NPTEL course – p.4/
-^ The final cross-validation estimate is the average oferrors of each learnt model on data points not usedfor training that model.
PR NPTEL course – p.7/
PR NPTEL course – p.8/
number of training sets, each of size
sampling from the given data set. • Then we learn a model on each of the
number of training sets, each of size
sampling from the given data set. • Then we learn a model on each of the
-^ The final error estimate could be the average of errorsof all the models.
PR NPTEL course – p.11/
-^ Unlike in cross-validation, here we can have as manytraining sets as we want (all with size
-^ However, they may all be very similar.
PR NPTEL course – p.13/
-^ Unlike in cross-validation, here we can have as manytraining sets as we want (all with size
-^ However, they may all be very similar.b^ ˆ •^ Let^ f^ denote the model learnt using the
PR NPTEL course – p.14/
-^ Here we are using the original data set as test datawhile for each
same data.
PR NPTEL course – p.16/
-^ Here we are using the original data set as test datawhile for each
same data. • Hence this bootstrap error estimate would not be verygood.
PR NPTEL course – p.17/
-^ Here we are using the original data set as test datawhile for each
same data. • Hence this bootstrap error estimate would not be verygood. • As an example, consider a problem where the classlabel is independent of the feature vector. • Then the true error rate is
PR NPTEL course – p.19/
-^ Here we are using the original data set as test datawhile for each
same data. • Hence this bootstrap error estimate would not be verygood. • As an example, consider a problem where the classlabel is independent of the feature vector. • Then the true error rate is
-^ Suppose we use the 1-nearest neighbour as ourclassifier.
PR NPTEL course – p.20/