Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Understanding Hypothesis Testing & Probability in Statistical Data Analysis, Slides of Computational and Statistical Data Analysis

Queen Mary, University of London (QMUL)Computational and Statistical Data Analysis

A part of a series of lectures on statistical data analysis by g. Cowan. It covers topics such as hypotheses, probability, bayes' theorem, random variables, pdfs, functions of random variables, expectation values, error propagation, the monte carlo method, statistical tests, test statistics, multivariate methods, and the fisher discriminant. The lectures provide an in-depth understanding of statistical concepts and their applications.

Typology: Slides

2011/2012

Uploaded on 03/08/2012

leyllin 🇬🇧

4.3

(15)

241 documents

1 / 28

This page cannot be seen from the preview

Don't miss anything!

G. Cowan Lectures on Statistical Data Analysis 1

Statistical Data Analysis: Lecture 5

1 Probability, Bayes’ theorem, random variables, pdfs

2 Functions of r.v.s, expectation values, error propagation

3 Catalogue of pdfs

4 The Monte Carlo method

5 Statistical tests: general concepts

6 Test statistics, multivariate methods

7 Goodness-of-fit tests

8 Parameter estimation, maximum likelihood

9 More maximum likelihood

10 Method of least squares

11 Interval estimation, setting limits

12 Nuisance parameters, systematic uncertainties

13 Examples of Bayesian approach

14 tba

Discover Slides of Computational and Statistical Data Analysis Queen Mary, University of London (QMUL)

Partial preview of the text

Download Understanding Hypothesis Testing & Probability in Statistical Data Analysis and more Slides Computational and Statistical Data Analysis in PDF only on Docsity!

G. Cowan Lectures on Statistical Data Analysis^1

Statistical Data Analysis: Lecture 5

1 Probability, Bayes’ theorem, random variables, pdfs 2 Functions of r.v.s, expectation values, error propagation 3 Catalogue of pdfs 4 The Monte Carlo method 5 Statistical tests: general concepts 6 Test statistics, multivariate methods 7 Goodness-of-fit tests 8 Parameter estimation, maximum likelihood 9 More maximum likelihood 10 Method of least squares 11 Interval estimation, setting limits 12 Nuisance parameters, systematic uncertainties 13 Examples of Bayesian approach 14 tba

G. Cowan Lectures on Statistical Data Analysis^2

Hypotheses

A hypothesis H specifies the probability for the data, i.e., the outcome of the observation, here symbolically: x. x could be uni-/multivariate, continuous or discrete. E.g. write x ~ f ( x | H ). x could represent e.g. observation of a single particle, a single event, or an entire “experiment”. Possible values of x form the sample space S (or “data space”). Simple (or “point”) hypothesis: f ( x | H ) completely specified. Composite hypothesis: H contains unspecified parameter(s). The probability for x given H is also called the likelihood of the hypothesis, written L ( x | H ).

G. Cowan Lectures on Statistical Data Analysis^4

Definition of a test (2)

But in general there are an infinite number of possible critical

regions that give the same significance level α.

So the choice of the critical region for a test of H 0 needs to take into account the alternative hypothesis H 1

Roughly speaking, place the critical region where there is a low probability to be found if H 0 is true, but high if H 1 is true:

G. Cowan Lectures on Statistical Data Analysis^5

Rejecting a hypothesis

Note that rejecting H 0 is not necessarily equivalent to the statement that we believe it is false and H 1 true. In frequentist statistics only associate probability with outcomes of repeatable observations (the data). In Bayesian statistics, probability of the hypothesis (degree of belief) would be found using Bayes’ theorem:

which depends on the prior probability π( H ).

What makes a frequentist test useful is that we can compute the probability to accept/reject a hypothesis assuming that it is true, or assuming some alternative is true.

G. Cowan Lectures on Statistical Data Analysis^7

Example setting for statistical tests:

the Large Hadron Collider

Counter-rotating proton beams in 27 km circumference ring pp centre-of-mass energy 14 TeV Detectors at 4 pp collision points: ATLAS CMS LHCb (b physics) ALICE (heavy ion physics) general purpose

G. Cowan Lectures on Statistical Data Analysis^8

The ATLAS detector

2100 physicists 37 countries 167 universities/labs 25 m diameter 46 m length 7000 tonnes ~ 8 electronic channels

Lectures on Statistical Data Analysis 10

Background events

This event from Standard Model ttbar production also has high p T jets and muons, and some missing transverse energy. → can easily mimic a SUSY event. G. Cowan

Lectures on Statistical Data Analysis 11 For each reaction we consider we will have a hypothesis for the pdf of , e.g.,

Statistical tests (in a particle physics context)

Suppose the result of a measurement for an individual event is a collection of numbers x 1 = number of muons, x 2 = mean p T of jets, x 3 = missing energy, ... follows some n -dimensional joint pdf, which depends on the type of event produced, i.e., was it etc. E.g. call H 0 the background hypothesis (the event type we want to reject); H 1 is signal hypothesis (the type we want). G. Cowan

Lectures on Statistical Data Analysis 13

Other ways to select events

Or maybe use some other sort of decision boundary: accept

H

0 accept

H

0 linear or nonlinear How can we do this in an ‘optimal’ way? G. Cowan

Lectures on Statistical Data Analysis 14

Test statistics

The decision boundary can be defined by an equation of the form We can work out the pdfs Decision boundary is now a single ‘cut’ on t , which divides the space into the critical (rejection) region and acceptance region. This defines a test. If the data fall in the critical region, we reject H

where t ( x 1 ,…, x n ) is a scalar test statistic. G. Cowan

Lectures on Statistical Data Analysis 16

Purity of event selection

Suppose only one background type b; overall fractions of signal

and background events are π

and π

b (prior probabilities). Suppose we select signal events with t > t cut

. What is the ‘purity’ of our selected sample? Here purity means the probability to be signal given that the event was accepted. Using Bayes’ theorem we find: So the purity depends on the prior probabilities as well as on the signal and background efficiencies. G. Cowan

Lectures on Statistical Data Analysis 17

Constructing a test statistic

How can we choose a test’s critical region in an ‘optimal way’? Neyman-Pearson lemma states: To get the highest power for a given significance level in a test of H 0 , (background) versus H 1 , (signal) the critical region should have inside the region, and ≤ c outside, where c is a constant which determines the power. Equivalently, optimal scalar test statistic is N.B. any monotonic function of this is leads to the same test. G. Cowan

G. Cowan Lectures on Statistical Data Analysis^19

Multivariate methods

Many new (and some old) methods: Fisher discriminant Neural networks Kernel density methods Support Vector Machines Decision trees Boosting Bagging New software for HEP, e.g., TMVA , Höcker, Stelzer, Tegenfeldt, Voss, Voss, physics/ StatPatternRecognition , I. Narsky, physics/

G. Cowan Lectures on Statistical Data Analysis^20

Linear test statistic

Ansatz: → Fisher: maximize Choose the parameters a 1 , ..., a n so that the pdfs have maximum ‘separation’. We want:

s^ Σb

g ( t ) τ

b large distance between mean values, small widths

Understanding Hypothesis Testing & Probability in Statistical Data Analysis, Slides of Computational and Statistical Data Analysis

Related documents

Partial preview of the text

Download Understanding Hypothesis Testing & Probability in Statistical Data Analysis and more Slides Computational and Statistical Data Analysis in PDF only on Docsity!

Statistical Data Analysis: Lecture 5

Hypotheses

Definition of a test (2)

regions that give the same significance level α.

Rejecting a hypothesis

which depends on the prior probability π( H ).

Example setting for statistical tests:

the Large Hadron Collider

The ATLAS detector

Background events

Statistical tests (in a particle physics context)

Other ways to select events

H

H

H

H

Test statistics

Purity of event selection

and background events are π

and π

Constructing a test statistic

Multivariate methods

Linear test statistic

s^ Σb

g ( t ) τ