Lectures on Statistical Data Analysis: Confidence Intervals and Poisson Parameter Limits, Slides of Computational and Statistical Data Analysis

A series of lecture notes on statistical data analysis, specifically focusing on confidence intervals and poisson parameter limits. The notes cover topics such as frequentist and bayesian intervals, confidence belt construction, relation between confidence interval and p-value, and setting limits on poisson parameter. The document also discusses the calculation of poisson parameter limits using the χ2 distribution and the implications of limits near physical boundaries.

Typology: Slides

2011/2012

Uploaded on 03/08/2012

leyllin
leyllin 🇬🇧

4.3

(15)

241 documents

1 / 28

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
G. Cowan Lectures on Statistical Data Analysis 1
Statistical Data Analysis: Lecture 11
1 Probability, Bayes’ theorem, random variables, pdfs
2 Functions of r.v.s, expectation values, error propagation
3 Catalogue of pdfs
4 The Monte Carlo method
5 Statistical tests: general concepts
6 Test statistics, multivariate methods
7 Significance tests
8 Parameter estimation, maximum likelihood
9 More maximum likelihood
10 Method of least squares
11 Interval estimation, setting limits
12 Nuisance parameters, systematic uncertainties
13 Examples of Bayesian approach
14 tba
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c

Partial preview of the text

Download Lectures on Statistical Data Analysis: Confidence Intervals and Poisson Parameter Limits and more Slides Computational and Statistical Data Analysis in PDF only on Docsity!

Statistical Data Analysis: Lecture 11

1 Probability, Bayes’ theorem, random variables, pdfs 2 Functions of r.v.s, expectation values, error propagation 3 Catalogue of pdfs 4 The Monte Carlo method 5 Statistical tests: general concepts 6 Test statistics, multivariate methods 7 Significance tests 8 Parameter estimation, maximum likelihood 9 More maximum likelihood 10 Method of least squares 11 Interval estimation, setting limits 12 Nuisance parameters, systematic uncertainties 13 Examples of Bayesian approach 14 tba

Interval estimation — introduction

Often use +/ the estimated standard deviation of the estimator. In some cases, however, this is not adequate: estimate near a physical boundary, e.g., an observed event rate consistent with zero. In addition to a ‘point estimate’ of a parameter we should report an interval reflecting its statistical uncertainty. Desirable properties of such an interval may include: communicate objectively the result of the experiment; have a given probability of containing the true parameter; provide information needed to draw conclusions about the parameter possibly incorporating stated prior beliefs. We will look briefly at Frequentist and Bayesian intervals.

Confidence interval from the confidence belt

Find points where observed estimate intersects the confidence belt. The region between u α

( θ) and v

β

( θ) is called the confidence belt.

This gives the confidence interval [ a , b ]

Confidence level = 1 α β = probability for the interval to

cover true value of the parameter (holds for any possible true θ).

Confidence intervals by inverting a test

Confidence intervals for a parameter θ can be found by

defining a test of the hypothesized value θ (do this for all θ):

Specify values of the data that are ‘disfavoured’ by θ

(critical region) such that P (data in critical region) ≤ γ

for a prespecified γ, e.g., 0.05 or 0.1.

If data observed in the critical region, reject the value θ.

Now invert the test to define a confidence interval as:

set of θ values that would not be rejected in a test of

size γ (confidence level is 1 - γ ).

The interval will cover the true value of θ with probability ≥ 1 - γ.

Equivalent to confidence belt construction; confidence belt is acceptance region of a test.

Confidence intervals in practice

The recipe to find the interval [ a , b ] boils down to solving

→ a is hypothetical value of θ such that

→ b is hypothetical value of θ such that

Meaning of a confidence interval

Intervals from the likelihood function

In the large sample limit it can be shown for ML estimators: defines a hyper-ellipsoidal confidence region, If (^) then ( n -dimensional Gaussian, covariance V )

Approximate confidence regions from L ( θ )

So the recipe to find the confidence region with CL = 1 γ is:

For finite samples, these are approximate confidence regions.

Coverage probability not guaranteed to be equal to 1 γ ;

no simple theorem to say by how far off it will be (use MC). Remember here the interval is random, not the parameter.

Setting limits on Poisson parameter

Consider again the case of finding n = n s

  • n b events where n b events from known processes (background) n s events from a new process (signal) are Poisson r.v.s with means s , b , and thus n = n s
  • n b is also Poisson with mean = s + b. Assume b is known. Suppose we are searching for evidence of the signal process, but the number of events found is roughly equal to the expected number of background events, e.g., b = 4.6 and we observe n obs = 5 events. → set upper limit on the parameter s. The evidence for the presence of signal events is not statistically significant,

Upper limit for Poisson parameter

Find the hypothetical value of s such that there is a given small probability, say, γ = 0.05, to find as few events as we did or less: Solve numerically for s = s up , this gives an upper limit on s at a

confidence level of 1 γ.

Example: suppose b = 0 and we find n obs

= 0. For 1 γ = 0.95,

Limits near a physical boundary

Suppose e.g. b = 2.5 and we observe n = 0. If we choose CL = 0.9, we find from the formula for s up Physicist: We already knew s ≥ 0 before we started; can’t use negative upper limit to report result of expensive experiment! Statistician: The interval is designed to cover the true value only 90% of the time — this was clearly not one of those times. Not uncommon dilemma when limit of parameter is close to a physical boundary.

Expected limit for s = 0

Physicist: I should have used CL = 0.95 — then s up

Even better: for CL = 0.917923 we get s up

4 ! Reality check: with b = 2.5, typical Poisson fluctuation in n is at least √2.5 = 1.6. How can the limit be so low? Look at the mean limit for the no-signal hypothesis ( s = 0) (sensitivity). Distribution of 95% CL limits with b = 2.5, s = 0. Mean upper limit = 4.

Bayesian prior for Poisson parameter

Include knowledge that s ≥0 by setting prior π( s ) = 0 for s <0.

Often try to reflect ‘prior ignorance’ with e.g. Not normalized but this is OK as long as L ( s ) dies off for large s. Not invariant under change of parameter — if we had used instead a flat prior for, say, the mass of the Higgs boson, this would imply a non-flat prior for the expected number of Higgs events. Doesn’t really reflect a reasonable degree of belief, but often used as a point of reference; or viewed as a recipe for producing an interval whose frequentist properties can be studied (coverage will depend on true s ).

Bayesian interval with flat prior for s

Solve numerically to find limit s up

For special case b = 0, Bayesian upper limit with flat prior numerically same as classical case (‘coincidence’). Otherwise Bayesian limit is everywhere greater than classical (‘conservative’). Never goes negative. Doesn’t depend on b if n = 0.