Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Statistical Data Analysis: Probability, Hypothesis Testing, and Chi-Square, Slides of Computational and Statistical Data Analysis

Queen Mary, University of London (QMUL)Computational and Statistical Data Analysis

A series of lecture notes on statistical data analysis, specifically focusing on probability theory, bayes' theorem, random variables, probability distributions, hypothesis testing, and the chi-square test. The notes cover topics such as functions of random variables, expectation values, error propagation, the monte carlo method, p-values, and the significance of a peak.

Typology: Slides

2011/2012

Uploaded on 03/08/2012

leyllin 🇬🇧

4.3

(15)

241 documents

1 / 18

This page cannot be seen from the preview

Don't miss anything!

G. Cowan Lectures on Statistical Data Analysis 1

Statistical Data Analysis: Lecture 7

1 Probability, Bayes’ theorem, random variables, pdfs

2 Functions of r.v.s, expectation values, error propagation

3 Catalogue of pdfs

4 The Monte Carlo method

5 Statistical tests: general concepts

6 Test statistics, multivariate methods

7 Significance tests

8 Parameter estimation, maximum likelihood

9 More maximum likelihood

10 Method of least squares

11 Interval estimation, setting limits

12 Nuisance parameters, systematic uncertainties

13 Examples of Bayesian approach

14 tba

Discover Slides of Computational and Statistical Data Analysis Queen Mary, University of London (QMUL)

Partial preview of the text

Download Statistical Data Analysis: Probability, Hypothesis Testing, and Chi-Square and more Slides Computational and Statistical Data Analysis in PDF only on Docsity!

Statistical Data Analysis: Lecture 7

1 Probability, Bayes’ theorem, random variables, pdfs 2 Functions of r.v.s, expectation values, error propagation 3 Catalogue of pdfs 4 The Monte Carlo method 5 Statistical tests: general concepts 6 Test statistics, multivariate methods 7 Significance tests 8 Parameter estimation, maximum likelihood 9 More maximum likelihood 10 Method of least squares 11 Interval estimation, setting limits 12 Nuisance parameters, systematic uncertainties 13 Examples of Bayesian approach 14 tba

Testing significance / goodness-of-fit

Suppose hypothesis H predicts pdf observations for a set of We observe a single point in this space: What can we say about the validity of H in light of the data? Decide what part of the data space represents less compatibility with H than does the point (^) less compatible with H more compatible with H (Not unique!)

p -value example: testing whether a coin is ‘fair’

i.e. p = 0.0026 is the probability of obtaining such a bizarre result (or more so) ‘by chance’, under the assumption of H. Probability to observe n heads in N coin tosses is binomial: Hypothesis H : the coin is fair ( p = 0.5). Suppose we toss the coin N = 20 times and get n = 17 heads. Region of data space with equal or lesser compatibility with H relative to n = 17 is: n = 17, 18, 19, 20, 0, 1, 2, 3. Adding up the probabilities for these values gives:

The significance of an observed signal

Suppose we observe n events; these can consist of: n b events from known processes (background) n s events from a new process (signal) If n s , n b are Poisson r.v.s with means s , b , then n = n s

n b is also Poisson, mean = s + b : Suppose b = 0.5, and we observe n obs = 5. Should we claim evidence for a new discovery? Give p -value for hypothesis s = 0:

The significance of a peak

Suppose we measure a value x for each event and find: Each bin (observed) is a Poisson r.v., means are given by dashed lines. In the two bins with the peak, 11 entries found with b = 3.2. The p -value for the s = 0 hypothesis is:

The significance of a peak (2)

But... did we know where to look for the peak? → give P ( n ≥ 11) in any 2 adjacent bins Is the observed width consistent with the expected x resolution? → take x window several times the expected resolution How many bins × distributions have we looked at? → look at a thousand of them, you’ll find a 10

effect Did we adjust the cuts to ‘enhance’ the peak? → freeze cuts, repeat analysis with new data How about the bins to the sides of the peak... (too low!) Should we publish????

G. Cowan 10

Distribution of the p -value

The p -value is a function of the data, and is thus itself a random variable with a given distribution. Suppose the p -value of H is found from a test statistic t ( x ) as Lectures on Statistical Data Analysis The pdf of p H under assumption of H is In general for continuous data, under assumption of H , p H ~ Uniform[0,1] and is concentrated toward zero for Some (broad) class of alternatives. pH g ( p H

|H )

g ( p H

|H′ )

G. Cowan 11

Using a p -value to define test of H

0 So the probability to find the p -value of H 0 , p 0

, less than α is

Lectures on Statistical Data Analysis We started by defining critical region in the original data space ( x ), then reformulated this in terms of a scalar test statistic t ( x ). We can take this one step further and define the critical region of a test of H 0

with size α as the set of data space where p

Formally the p -value relates only to H 0 , but the resulting test will have a given power with respect to a given alternative H 1

Pearson’s χ

test

If n i

are Gaussian with mean ν

and std. dev. σ

i , i.e., n i

~ N( ν

i 2 ),

then Pearson’s χ

will follow the χ

pdf (here for χ

2 = z ): If the n i

are Poisson with ν

>> 1 (in practice OK for ν

then the Poisson dist. becomes Gaussian and therefore Pearson’s

statistic here as well follows the χ

2 pdf.

The χ

2 value obtained from the data then gives the p -value:

The ‘ χ

per degree of freedom’

Recall that for the chi-square pdf for N degrees of freedom,

This makes sense: if the hypothesized ν

i are right, the rms deviation of n i

from ν

is σ

i , so each term in the sum contributes ~ 1.

One often sees χ

2 / N reported as a measure of goodness-of-fit.

But... better to give χ

2 and N separately. Consider, e.g.,

i.e. for N large, even a χ

2 per dof only a bit greater than one can imply a small p -value, i.e., poor goodness-of-fit.

Example of a χ

test

← This gives for N = 20 dof. Now need to find p -value, but... many bins have few (or no)

entries, so here we do not expect χ

2 to follow the chi-square pdf.

Using MC to find distribution of χ

statistic

The Pearson χ

2 statistic still reflects the level of agreement between data and prediction, i.e., it is still a ‘valid’ test statistic. To find its sampling distribution, simulate the data with a Monte Carlo program: Here data sample simulated 10 6 times. The fraction of times we

find χ

29.8 gives the p -value: p = 0. If we had used the chi-square pdf we would find p = 0.073.

Statistical Data Analysis: Probability, Hypothesis Testing, and Chi-Square, Slides of Computational and Statistical Data Analysis

Related documents

Partial preview of the text

Download Statistical Data Analysis: Probability, Hypothesis Testing, and Chi-Square and more Slides Computational and Statistical Data Analysis in PDF only on Docsity!

Statistical Data Analysis: Lecture 7

Testing significance / goodness-of-fit

p -value example: testing whether a coin is ‘fair’

The significance of an observed signal

The significance of a peak

The significance of a peak (2)

Distribution of the p -value

|H )

|H′ )

Using a p -value to define test of H

, less than α is

with size α as the set of data space where p

Pearson’s χ

test

are Gaussian with mean ν

and std. dev. σ

~ N( ν

then Pearson’s χ

will follow the χ

pdf (here for χ

are Poisson with ν

>> 1 (in practice OK for ν

statistic here as well follows the χ

The χ

The ‘ χ

per degree of freedom’

This makes sense: if the hypothesized ν

from ν

is σ

One often sees χ

But... better to give χ

i.e. for N large, even a χ

Example of a χ

test

entries, so here we do not expect χ

Using MC to find distribution of χ

statistic

The Pearson χ

find χ