Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Model Selection and Assessment in Machine Learning, Study notes of Algorithms and Programming

Model selection and assessment in machine learning, with a focus on controlling overfitting in decision trees. It covers techniques such as train, validation, test and K-fold cross-validation. The document also explores methods for evaluating the accuracy of classification rules and learning algorithms, including approximate statistical tests and rule post-pruning. The text includes examples of text classification and real-world processes for evaluating learned hypotheses. a mix of lecture notes and study notes for a machine learning course at Cornell University.

Typology: Study notes

2021/2022

Uploaded on 05/11/2023

ekadant
ekadant 🇺🇸

4.2

(27)

267 documents

1 / 3

Toggle sidebar

Related documents


Partial preview of the text

Download Model Selection and Assessment in Machine Learning and more Study notes Algorithms and Programming in PDF only on Docsity! 1 Model Selection and Assessment CS4780/5780 – Machine Learning Fall 2014 Thorsten Joachims Cornell University Reading: Mitchell Chapter 5 Dietterich, T. G., (1998). Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms. Neural Computation, 10 (7) 1895-1924. (http://sci2s.ugr.es/keel/pdf/algorithm/articulo/dietterich1998.pdf) Outline • Model Selection – Controlling overfitting in decision trees – Train, validation, test – K-fold cross validation • Evaluation – What is the true error of classification rule h? – Is rule h1 more accurate than h2? – Is learning algorithm A1 better than A2? Learning as Prediction Overfitting • Note: Accuracy = 1.0-Error [Mitchell] Controlling Overfitting in Decision Trees • Early Stopping: Stop growing the tree and introduce leaf when splitting no longer “reliable”. – Restrict size of tree (e.g., number of nodes, depth) – Minimum number of examples in node – Threshold on splitting criterion • Post Pruning: Grow full tree, then simplify. – Reduced-error tree pruning – Rule post-pruning Reduced-Error Pruning 2 Model Selection • Training: Run learning algorithm m times (e.g. different parameters). • Validation Error: Errors ErrSval (ĥi) is an estimates of ErrP(ĥi) for each hi. • Selection: Use hi with min ErrSval (ĥi) for prediction on test examples. Real-world Process Learner 1 Train Sample Strain’ Val. Sample Sval split randomly split randomly ĥ1 Strain’ Train Sample Strain drawn i.i.d. Learner m … ĥk ĥ Test Sample Stest drawn i.i.d. Text Classification Example: “Corporate Acquisitions” Results • Unpruned Tree (ID3 Algorithm): – Size: 437 nodes Training Error: 0.0% Test Error: 11.0% • Early Stopping Tree (ID3 Algorithm): – Size: 299 nodes Training Error: 2.6% Test Error: 9.8% • Reduced-Error Tree Pruning (C4.5 Algorithm): – Size: 167 nodes Training Error: 4.0% Test Error: 10.8% • Rule Post-Pruning (C4.5 Algorithm): – Size: 164 tests Training Error: 3.1% Test Error: 10.3% – Examples of rules • IF vs = 1 THEN - [99.4%] • IF vs = 0 & export = 0 & takeover = 1 THEN + [93.6%] Evaluating Learned Hypotheses • Goal: Find h with small prediction error ErrP(h) over P(X,Y). • Question: How good is ErrP(ĥ) of ĥ found on training sample Strain. • Training Error: Error ErrStrain (ĥ) on training sample. • Test Error: Error ErrStest (ĥ) is an estimate of ErrP(ĥ) . Real-world Process (x1,y1), …, (xn,yn) Learner (incl. ModSel) (x1,y1),…(xk,yk) Training Sample Strain Test Sample Stest split randomly split randomly ĥ Strain Sample S drawn i.i.d. What is the True Error of a Hypothesis? • Given – Sample of labeled instances S – Learning Algorithm A • Setup – Partition S randomly into Strain (70%) and Stest (30%) – Train learning algorithm A on Strain, result is ĥ. – Apply ĥ to Stest and compare predictions against true labels. • Test – Error on test sample ErrStest (ĥ) is estimate of true error ErrP(ĥ). – Compute confidence interval. (x1,y1), …, (xn,yn) Learner (x1,y1),…(xk,yk) Training Sample Strain Test Sample Stest Strain ĥ Binomial Distribution • The probability of observing x heads in a sample of n independent coin tosses, where in each toss the probability of heads is p, is • Normal approximation: For np(1-p)>=5 the binomial can be approximated by the normal distribution with – Expected value: E(X)=np Variance: Var(X)=np(1-p) – With probability , the observation x falls in the interval  50% 68% 80% 90% 95% 98% 99% z 0.67 1.00 1.28 1.64 1.96 2.33 2.58 Text Classification Example: Results • Data – Training Sample: 2000 examples – Test Sample: 600 examples • Unpruned Tree: – Size: 437 nodes Training Error: 0.0% Test Error: 11.0% • Early Stopping Tree: – Size: 299 nodes Training Error: 2.6% Test Error: 9.8% • Post-Pruned Tree: – Size: 167 nodes Training Error: 4.0% Test Error: 10.8% • Rule Post-Pruning: – Size: 164 tests Training Error: 3.1% Test Error: 10.3%