Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Statistical Significance in Machine Learning: Central Limit Theorem & Confidence Intervals, Lecture notes of Machine Learning

The concept of statistical significance in machine learning, focusing on the central limit theorem, confidence intervals, and margin of error. It explains how the average and standard deviation change as the sample size increases, and how to calculate confidence levels and interpret the results. This information is crucial for understanding the performance of machine learning models and making data-driven decisions.

Typology: Lecture notes

2018/2019

Uploaded on 11/01/2019

chetan-reddy
chetan-reddy 🇺🇸

9 documents

1 / 12

Toggle sidebar

Related documents


Partial preview of the text

Download Statistical Significance in Machine Learning: Central Limit Theorem & Confidence Intervals and more Lecture notes Machine Learning in PDF only on Docsity! Introduction to Machine Learning CH12: STATISTICAL SIGNIFICANCE 1 CAADMSAWW Generated Binary LUU RANQOITIY Numbers Se ee ee ee | 10 | 82 ann nee oA A Onn ocoonrnTt oT fo Se ee ee mown Onna WoT i ee ee ee | Conon onnt occ tT monn tana [ee ee ee oe 9 8 9 5 10 8 9 5 9 ee eo Standard Error of Sample-Based Estimates (Example) Let the size of the testing set be Let the proportion of correct class labels, in this sample, be This is our estimate of classification accuracy Note: both and are satisfied The standard error of the estimate: Therefore: ◦ Classification accuracy is estimated as 5 Gaussian Distribution AMEE Ip Santana geiaiwn: o | 36 20 —o P o 20 30 Confidence For the given p and , calculate the confidence--the percentage of estimates that will fall into interval 7 Margin of Error The confidence interval has the form, Here, is called the margin of error The size of depends on the following: ◦ Level of confidence, affecting ◦ Size of the testing set, ◦ Classification accuracy, 10 Statistical Evaluation of a Classifier 1) For the given size, n, of the testing set, and for the claimed classification accu- racy, acc, check whether the conditions for normal distribution are satisfied: n-ace > 10 and n- (1— acc) > 10 2) Calculate the standard error by the usual formula: ace(l—acc Sace = 7" 3) Assuming that the normal-distribution assumption is correct, find in Table 12.2 the z*-value for the requested level of confidence. The corresponding confidence interval is [acc — 2* + Sgec, acc + 2° Sac]. 4) If the value measured on the testing set finds itself outside this interval, reject the claim that the accuracy equals acc. Otherwise, assume that the available evidence is insufficient for the rejection. Two Types of Error Type I Error (false alarm) ◦ With 95% confidence level, 5% of the tests that are out of the confidence interval are actually okay ◦ Hence: an occasional false alarm ◦ Can be reduced by higher confidence level (and thus broader interval) Type II Error (failing to detect) ◦ A bad classifier can occasionally fall into the confidence interval ◦ Hence: a bad classifier is accepted ◦ Can be reduced by lower confidence level (and thus narrower interval) 12