Interval Estimation-Computing and Statistical Data Analysis-Lecture 12 Slides-Physics, Slides of Computational and Statistical Data Analysis

Confidence intervals for a parameter Theta can be found by defining a test of the hypothesized value Theta (do this for all Theta). Interval Estimation, Frequentist Confidence Intervals, Confidence Belt, Confidence Interval, Poisson Parameter, Bayesian, Feldman Cousins, Statistical Data Analysis, Lecture Slides, Glen Cowan, Physics Department, University of London, United Kingdom.

Typology: Slides

2011/2012

Uploaded on 03/08/2012

leyllin
leyllin 🇬🇧

4.3

(15)

241 documents

1 / 28

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
G. Cowan Lectures on Statistical Data Analysis Lecture 12 page 1
Statistical Data Analysis: Lecture 12
1 Probability, Bayes’ theorem
2 Random variables and probability densities
3 Expectation values, error propagation
4 Catalogue of pdfs
5 The Monte Carlo method
6 Statistical tests: general concepts
7 Test statistics, multivariate methods
8 Goodness-of-fit tests
9 Parameter estimation, maximum likelihood
10 More maximum likelihood
11 Method of least squares
12 Interval estimation, setting limits
13 Nuisance parameters, systematic uncertainties
14 Examples of Bayesian approach
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c

Partial preview of the text

Download Interval Estimation-Computing and Statistical Data Analysis-Lecture 12 Slides-Physics and more Slides Computational and Statistical Data Analysis in PDF only on Docsity!

Statistical Data Analysis: Lecture 12

1 Probability, Bayes’ theorem 2 Random variables and probability densities 3 Expectation values, error propagation 4 Catalogue of pdfs 5 The Monte Carlo method 6 Statistical tests: general concepts 7 Test statistics, multivariate methods 8 Goodness-of-fit tests 9 Parameter estimation, maximum likelihood 10 More maximum likelihood 11 Method of least squares 12 Interval estimation, setting limits 13 Nuisance parameters, systematic uncertainties 14 Examples of Bayesian approach

Interval estimation — introduction

Often use +/ the estimated standard deviation of the estimator. In some cases, however, this is not adequate: estimate near a physical boundary, e.g., an observed event rate consistent with zero. In addition to a ‘point estimate’ of a parameter we should report an interval reflecting its statistical uncertainty. Desirable properties of such an interval may include: communicate objectively the result of the experiment; have a given probability of containing the true parameter; provide information needed to draw conclusions about the parameter possibly incorporating stated prior beliefs. We will look briefly at Frequentist and Bayesian intervals.

Confidence interval from the confidence belt

Find points where observed estimate intersects the confidence belt. The region between u α

( θ) and v

β

( θ) is called the confidence belt.

This gives the confidence interval [ a , b ]

Confidence level = 1 α β = probability for the interval to

cover true value of the parameter (holds for any possible true θ).

Confidence intervals by inverting a test

Confidence intervals for a parameter θ can be found by

defining a test of the hypothesized value θ (do this for all θ):

Specify values of the data that are ‘disfavoured’ by θ

(critical region) such that P (data in critical region) ≤ γ

for a prespecified γ, e.g., 0.05 or 0.1.

If data observed in the critical region, reject the value θ.

Now invert the test to define a confidence interval as:

set of θ values that would not be rejected in a test of

size γ (confidence level is 1 γ ).

The interval will cover the true value of θ with probability ≥ 1 γ.

Equivalent to confidence belt construction; confidence belt is acceptance region of a test.

Confidence intervals in practice

The recipe to find the interval [ a , b ] boils down to solving

→ a is hypothetical value of θ such that

→ b is hypothetical value of θ such that

Meaning of a confidence interval

Intervals from the likelihood function

In the large sample limit it can be shown for ML estimators: defines a hyper-ellipsoidal confidence region, If (^) then ( n -dimensional Gaussian, covariance V )

Approximate confidence regions from L ( θ )

So the recipe to find the confidence region with CL = 1 γ is:

For finite samples, these are approximate confidence regions.

Coverage probability not guaranteed to be equal to 1 γ ;

no simple theorem to say by how far off it will be (use MC). Remember here the interval is random, not the parameter.

Setting limits on Poisson parameter

Consider again the case of finding n = n s

  • n b events where n b events from known processes (background) n s events from a new process (signal) are Poisson r.v.s with means s , b , and thus n = n s
  • n b is also Poisson with mean = s + b. Assume b is known. Suppose we are searching for evidence of the signal process, but the number of events found is roughly equal to the expected number of background events, e.g., b = 4.6 and we observe n obs = 5 events. → set upper limit on the parameter s. The evidence for the presence of signal events is not statistically significant,

Upper limit for Poisson parameter

Find the hypothetical value of s such that there is a given small probability, say, γ = 0.05, to find as few events as we did or less: Solve numerically for s = s up , this gives an upper limit on s at a

confidence level of 1 γ.

Example: suppose b = 0 and we find n obs

= 0. For 1 γ = 0.95,

Limits near a physical boundary

Suppose e.g. b = 2.5 and we observe n = 0. If we choose CL = 0.9, we find from the formula for s up Physicist: We already knew s ≥ 0 before we started; can’t use negative upper limit to report result of expensive experiment! Statistician: The interval is designed to cover the true value only 90% of the time — this was clearly not one of those times. Not uncommon dilemma when limit of parameter is close to a physical boundary.

Expected limit for s = 0

Physicist: I should have used CL = 0.95 — then s up

Even better: for CL = 0.917923 we get s up

4 ! Reality check: with b = 2.5, typical Poisson fluctuation in n is at least √2.5 = 1.6. How can the limit be so low? Look at the mean limit for the no-signal hypothesis ( s = 0) (sensitivity). Distribution of 95% CL limits with b = 2.5, s = 0. Mean upper limit = 4.

Bayesian prior for Poisson parameter

Include knowledge that s ≥0 by setting prior π( s ) = 0 for s <0.

Often try to reflect ‘prior ignorance’ with e.g. Not normalized but this is OK as long as L ( s ) dies off for large s. Not invariant under change of parameter — if we had used instead a flat prior for, say, the mass of the Higgs boson, this would imply a non-flat prior for the expected number of Higgs events. Doesn’t really reflect a reasonable degree of belief, but often used as a point of reference; or viewed as a recipe for producing an interval whose frequentist properties can be studied (coverage will depend on true s ).

Bayesian interval with flat prior for s

Solve numerically to find limit s up

For special case b = 0, Bayesian upper limit with flat prior numerically same as classical case (‘coincidence’). Otherwise Bayesian limit is everywhere greater than classical (‘conservative’). Never goes negative. Doesn’t depend on b if n = 0.