Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Data Analysis Techniques for High Energy Particle Physics, Lecture notes of Particle Physics

Stanford University Particle Physics

This report discusses statistical analysis techniques for high energy particle physics data. It introduces new techniques and places familiar methods into the general framework of statistical data analysis. The report is concerned with discovering properties of particle interactions from the data and deducing statistically meaningful statements about those properties. It covers univariate and multivariate data analysis techniques. The emphasis is on ideas and concepts rather than specific details. The report is a lecture note presented at the CERN School of Computing in 1974.

Typology: Lecture notes

Pre 2010

Uploaded on 05/11/2023

markzck 🇺🇸

4.2

(10)

253 documents

1 / 100

This page cannot be seen from the preview

Don't miss anything!

DATA ANALYSIS TECHNIQUES FOR

HIGH ENERGY PARTICLE PHYSICS*

JEROME H

. FRIEDMAN

STANFORD LINEAR ACCELERATOR CENTER

STANFORD UNIVERSITY

Stanford, California 94305

PREPARED FOR THE U

. S

. ATOMIC ENERGY

COMMISSION UNDER CONTRACT NO

. AT(04-3)-515

September 1974

Printed in the United States of America

. Available from National Technical

information Service, U

. S

. Department of Commerce, 5285 Port Royal Road,

Springfield, Virginia 22151

Price

: Printed Copy $5

.45

; Microfiche $1

.45

Lectures presented at the CERN School of Computing, Godoysund, Norway,

August 11-24, 1974

SLAC-176

UC-34d

(E/I)

Discover Lecture notes of Particle Physics Stanford University

Partial preview of the text

Download Data Analysis Techniques for High Energy Particle Physics and more Lecture notes Particle Physics in PDF only on Docsity!

DATA ANALYSIS TECHNIQUES FOR

HIGH ENERGY PARTICLE PHYSICS*

JEROME H. FRIEDMAN

STANFORD LINEAR ACCELERATOR CENTER

STANFORD UNIVERSITY

Stanford, California 94305

PREPARED FOR THE U. S. ATOMIC ENERGY

COMMISSION UNDER CONTRACT NO. AT(04-3)-

September 1974

Printed in the United States of America. Available from National Technical information Service, U. S. Department of Commerce, 5285 Port Royal Road, Springfield, Virginia 22151. Price : Printed Copy $5 .45 ; Microfiche $1.. Lectures presented at the CERN School of Computing, Godoysund, Norway, August 11-24, 1974.

SLAC-

UC-34d (E/I)

ABSTRACT

Useful techniques for the statistical analysis and presentation of

high energy particle physics data are described and discussed.

Page 6 .3 .2.1 The Mutual Information Measure for Pairwise Dependence (^72) 6 .3 .2.2 An Algorithm for the Direct Measure of Stochastic Independence (^73) 6 .4 A Multivariate Goodness-of-fit Test (^77)

FOOTNOTES AND REFERENCES 83

1. INTRODUCTION

The purpose of this report is to acquaint high energy physicists with a vari- ety of techniques for presenting and making statistical inferences from counted data. The attempt will be to introduce new techniques that are not commonly used in high energy particle physics as well as to place those methods that are familiar into the general framework of statistical data analysis. This report will not deal with the equally important problem of data seduction. That is, reducing the raw digitizations from particle detectors to more useful quantities such as particle momenta and angles. Although these calculations are often quite complex they seldom require statistical inference. (A notable exception is hypothesis discrimination in kinematic fitting. ) The computer codes that per- form these computations can usually be thought of as computing engines that transform the data from the raw experimental variables to those that are more convenient for further calculations. This report is concerned with these further calculations ; that is, how to discover properties of the particle interactions from the data, and deduce as well as present, statistically meaningful statements about those properties. The methods discussed are general in the sense that they can be applied to data from any science that have similar properties to those encountered in particle physics. In fact, many of the techniques that are discussed, although new to particle physics, are commonly used in other sciences, especially pat- tern recognition and artificial intelligence. The emphasis, however, will be on those methods that can be most profitably applied to the types of data usually encountered in high energy particle physics experiments.

and p (x) > 0 for all^ Zc R where R is the total region of measurement space. It is clear that p(x)^ contains^ all^ of the information of the experiment.^ The purpose of experimentation is to infer properties of p(i) from the observed dis- tributions of the measured counts. Conversely, it is the purpose of theory to calculate p(x)^ from mathematical models and infer from it the results of exper- iments. Data.analysis is divided into two types, parametric and non-parametri c. In parametric (or model dependent) analysis, p(x) is assumed to be a member of a parameterized family of distributions p(i) __ p(a ;^ x) , (2a) where ais the set of parameters (either discreet or continuous or both) that specify the particular distribution from the family of possible distributions. The problem of determination of the probability density function then becomes the problem of determining the appropriate values for the parameters T..^ The parti- cular parameterized family can come from the researchers intuition, invariance principles (such as angular momentum conservation) or specific dynamical models. For example, the Lorentz invariant amplitude squared for a reaction is the probability density in the Lorentz invariant phase space. In non-parametric (model independent) analysis no a priori information is assumed about the probability density function. In this case one infers the prob- ability density function directly from the counted data, with very little or no information about what form it might take. Histogramming is an example of a non-parametric (one-dimensional) density estimation. There are relative advantages and disadvantages to both types of analysis. When it is properly applied parametric analysis is usually statistically much more powerful than non-parametric analysis. This is due to the tremendous increase of information in restricting the set of all possible probability densities to those of a particular parameterized family. The results of the analysis, how- ever, crucially depend upon the correctness of this assumption. If the prob- ability density function that gives rise to the data is not a member of the sup- posed parameterized family, then at best the statistical power is reduced com- pared to non-parametric techniques, and at worst (usually the case) the results are meaningless. Non-parametric techniques have the advantage of being appli- cable to a wide range of problems since they require few assumptions concerning

the data. It should be kept in mind, however, that even though non-parametric techniques are usually formulated independently of specific probability densities their statistical performance usually varies with the actual probability density of the data. Statistical theory is far more developed for parametric analysis than non- parametric. This is especially true for the family of normal or Gaussian dis- tributions

PPC, E„x) = (^) (2 ,r)d/ I z11/^

exp[-1/2(x-μ)TE-1(x-μ)l (^) (3)

where the parameters are the location vector μand covariance matrix E. A great many of the statistical techniques in common use were designed to be opti- mal for normal distributions and are referred to as normal theory techniques. These techniques can lose considerable statistical power when applied to data with non-normal density distributions.

A MINI-INTRODUCTION TO ESTIMATION THEORY This section introduces the few necessary concepts in Statistics that are required to understand the sections that follow. As noted above, the set of measurements {^ xi^ }N^1 comprising an experiment can be thought of as random variables drawn from a probability density function p(x). The purpose of data analysis is to make inferences concerning p(x). In parametric analysis one usually wishes to infer likely possible values for the parameters. In non- parametric analysis the density itself is to be inferred. This process of statis- tical inference is called estimation.^ Particle physicists quite often (incorrectly) use the terms "measurement" or "determination" for statistical estimation. Consider a parametric example. Suppose that the set of measurements {zi} N 1 are known to be distributed according to p(a ; x) for some (unknown) value of a. The desire is to estimate the parameter, a, from the values of the measured random variables x1 ,1,... XN. Any function of a set of random variables Y = 4)(x1,72,... IN) (4 ) is itself a random variable with a probability density function p N^ (a ;Y) that can (at least in principle) be calculated from p(a ;7). If one is sufficiently clever in choosing the function, m, then p N(a,Y) might be large only for those values of Y near Y = a. That is, for any set of possible values for^ x1^ ,7^2 ,...^ xN^ drawn - 4 -

estimate, a, for the value of the parameter, a, N 1 a = I 1 [P(x1,x2,.. .xN)I = [N

~,f(xi ) J

J (^) i= Since the function, f(a), was somewhat arbitrary it is clear that this procedure can be used to construct a variety of statistics for estimating the parameter, a. However, some will be better than others. For example cNT , which regulates the precision with which the parameter, a, is estimated, depends on f(r) (for N < co)^ through Eq. 8. The field of Statistics is concerned to a great degree with finding good statistics for estimation and determining their properties. Statistics used for estimation (usually called estimators) are rated in terms of four basic proper- ties of their probability density distributions p N(a ;Y); these are consistency, efficiency, bias, and robustness.

3.1 Consistency An estimator, Y = T(xl,z2,... ,zN ) is consistent if the following condition holds

N-.oo"m^ pN (a ;Y)^ = 6(Y-a).^ (11) That is as the number of samples gets arbitrarily large, p(a ;Y) becomes an arbi- trarily narrow function of Y about a, and the estimator provides an arbitrarily precise estimate of the parameter, a. Note that Eqs. 7 and 8 show that the estimator defined by Eq. 6 is consistent. Consistency is nearly always required for an estimator to be considered useful. 3 .2 Efficiency Consistency is concerned with the precision of the estimator for infinite sample size. (In the field of Statistics, a result that holds in the limit of infinite sample sizes is called an asymptotic result. ) Efficiency is concerned with the precision of the estimator for finite sample size N. An estimator is called efficient if the variance (mean squared error) of its probability density function V N^ = (^) f (Y-a)2 PN(a;Y)dY (12) R is as small as possible 3) for a given N. The square root of the variance, oN = --, is characteristic of the width of P N(a ;Y) about a, and thus is directly related to the precision of the estimator. Therefore, an efficient estimator for

a given N is one that (loosely speaking) has maximal precision. The relative efficiency between two estimators is the inverse ratio of the variances of their probability densities for a given sample size, N. The efficiency of an estimator is its relative efficiency to an efficient estimator (i. e. , efficient estimators are said to have 100% efficiency). This definition of efficiency can be related to the intuitive meaning of the word in the following manner. For large sample size, N, the variance of most estimators decreases as, VN - 1/N, for increasing N (i. e. , (^) uN -^ 1/,fN). Then the efficiency of an estimator is the inverse ratio of the number of samples (events) it requires to the number an efficient estimator requires for the same precision. Clearly high efficiency is a desirable property for an estimator. However, an estimator with the highest efficiency is quite often not the most desirable. Sometimes the computational complexity of the most effi- cient estimator makes it more expensive for a given precision than a less effi- cient estimator even though the less efficient estimator requires more events. 3 .3 Bias Like efficiency, bias refers to a property of estimators for finite sample

A biased estimator is one with an expected value that is different from the true value of the parameter being estimated. The bias is just the difference between E N [Y] and the true value of the parameter. Note that, although it might appear to be contradictory, a biased estimator can also be consistent and conversely an unbiased estimator can be inconsistent. If a biased estimator is consistent, then from Eq. 11 lim b = 0. N -^00

N

It may at first seem that bias would be a very undesirable property for an esti- mator to have. This is generally not the case. It is only important that the bias be relatively small compared to the square root of the variance (Eq. 12) (standard deviation) of the probability density function. Most of the commonly used estimators in particle physics are in fact biased. There are various tech- niques for reducing bias in estimators but they usually do this at the expense

size, N. Specifically the bias of an estimator is defined as

bN = f Y p N(a ;Y) dY - a (13a) I .e ., R bN = E N [Y]-a^. (13b)

weights each point by the square of its distance from the center. The percentile estimate on the other hand will be completely unaffected by the mismeasured point. For exploratory data analysis especially, robustness is essential. Robust estimators generally maintain from 60% to 90% efficiencies over wide ranges of data distributions while non-robust estimators tend to have near 100% efficiency when the data distribution exactly follows the predicted probability density func- tion, and low efficiency when it does not.

ANALYSIS AND REPRESENTATION OF ONE-DIMENSIONAL DATA With the preliminaries of the preceeding section out of the way, we are ready to discuss and evaluate various techniques for analyzing and presenting data. We will start with univariate or one-dimensional data analysis. That is when only one measured quantity is considered at a time. We will discuss multi- variate analysis in the following sections. Univariate analysis techniques are far more developed than corresponding multivariate techniques. This is espe- cially true for non-parametric methods. There are many large text books devoted to statistical techniques for univariate analysis. Thus, there will be no attempt in this brief report for completeness. The purpose will be to introduce some techniques not commonly known to high energy particle physicists that could be valuable tools for analyzing particle physics data, and to relate them to the more commonly used techniques.
1 Non-Parametric Univariate Density Estimation Let (^) Ixi} N=1^ be a sequence of independent identically distributed random var- iables with some unknown probability density function p(x).^ We wish to construct estimators p(x) = TN(x1,x2,... x N) for p(x) that depend only on the .observations, i x IN I ill=1'
1. 1 The Histogram Approach Histogramming is the most commonly used method in particle physics. In this method the real line is divided into M regions, r i , (bins, channels) and p(x) is taken to be constant over each region r i :

p(X) =P i if^ xEri ,^ i=1,M.

Let gi(x) be an indicator function for each region, i .e. , j 1 if xer i gi(x )^ l 0 otherwise.

Then we have for our estimator of p(x), M N PN~) = (^) N S (^) S gi (x)gi(x). (14) i,=1 j=1 i From the central limit theorem one has

and

where (^) ai = a 0 / -. T N ,

when ni Npi is large. A more careful analysis shows that for any ni , the ni = Npi are distributed according to a multinomial distribution M P n i pN(n1 ,n2 ,.. .nNi) = N! II I (17) i=1 (^) n i! if the total number of events, N, is considered fixed. 4)^ Note from Eq. 17

E[ni ] = ni (18)

so that the estimator is unbiased. The variance of ni is

so that

Equation 19 shows that pi is a consistent estimator of p i. For pi << 1 (large number of bins for example) Eq. 19a can be approximated by

V[ni ) - Npi nl. (20)

Since ni is usually not known it seems reasonable to make the further approxima- tion ni ce^ ni (21)

E(Pi] = Pi = (^) P(x)dx .. r. 1

__ (^1) (pf- pi) pN (pi) (^) 2 x ai e 2 2 ai

V [n i ] = 9Pi (1-^ Pi) (19a)

V1Pi] = Pi(' - Pi )/N^.^ (19b)

orthogonal functions defined on the real line

fR O(x) 0 .(x)dx^ = Sit

and we wish to estimate p(x) from the data points (^) {x .}Nj=1 with an estimator of the form [M~ PN(x) = L~^ ci(N) 0 i(x)^.^ (24) i= If the actual probability density function, p(x), were known then it is easy to show that the variance of the density estimation, V N, (Eq. 23) is minimal for

ci (^) = f i (x) P(x)dx = E[Oi l. (25) R For non-parametric estimation p(x) is not known so we estimate the integral from the data sample N C1 (^) = N O i(x .) (26) j= From the central limit theorem one has (for large N)

[c (N)J = 1 -1/ ((or

PN i - ci ) 2n vN )

e (^) (a i))2 (27 )

where

aN (^) = V( O (^) )/N = (^) E [('V - E [x'1)2] /N.

Thus, E[ c i~)) =ci so that the estimate is unbiased and lim^ aN = 0 so that it is consistent.^ N_^ M Combining Eqs. 24 and 26 we have for our density estimate M N PN (X) N^ 2:^1

: (^) ~Ni (x .) ~Vi(x) (26) i=1 j= The average variance of the estimate, V N , Eq. 23, is

I f (^) (P -p N)2 dx } (^) = Elf (p-P)2dxI + E (^) I f (P-PN)2dxl (29)

where

VN = f p2 (x)dx -

R

p2 (x)dx + R

M

P(x) = L~ c i * (x) (30) i= The first term on the right hand side of Eq. 29 is a constant independent of the data so that

VN = f P2(x)dx^ - fP^2(x)dx+E[^ f[P (X) -PN(x)J' 2 dh]

R R^ RL

M

E (a (i ) )2 -

i=1 N Equation 31 shows that the variance of the estimate is composed of a constant systematic part and a statistical part that approaches zero as N becomes infinite. Thus, like the histogramming approach,the orthogonal function density estimator is inconsistent (unless by some chance p(x) =p(x) for all x -^ i. e. ,^ either M = ~, or for finite M, p(x) can exactly be expressed by Eq. 30). It is no accident that the histogramming and orthogonal function estimators share this property of inconsistency. Inspecting Eq. 14, one sees that it is just a special case of Eq. 28 where the orthogonal functions are the indicator func- tions gi (x).^ Note that

f gi(x) gj(x) dx^ = 6 ij

and N (^) n.

C (N)^ = N E gi (x .) = N

j=1 I The general orthogonal function approach suffers from generalized analogs of most of the problems discussed for histogramming. The problem of specific bin choice and number of bins becomes the problem of number and specific

choice of the orthogonal functions, {^ O^ i^ (x)}M^1.^ Also, it may happen that P N(.Y)

is negative for some value of x rendering it inadmissible as a probability den- sity function (although it still may be quite useful). 4 .1 .3 The Rosenblatt Estimator We will now begin to consider some consistent estimators of univariate pro- ability density. The first is the Rosenblatt or "naive pdf' (probability density

This estimator can be made consistent so long as h tends to zero, while the pro- duct (hN) approaches infinity. Parameterizing the window size as

h = CNa (39)

and choosing a so as to minimize the dominant terms in Eq. 38, one obtains a=-1/5 as the value that causes the variance to decrease most rapidly with in- creasing N. A careful analysis shows that the constant should be

C = [9p(x)/2Ip"(x)1 211/5.^ (40 ) The bias of this estimator (Eq. 37) is easy to understand. For finite window size the estimator p N(x) (Eq. 32) is an unbiased estimator of the average of the probability density within the window

_ x+h p(x) = 2h (^) x-h p(x')dx'

If p(x') is nonlinear within the window region, then this average will be different than the value of the probability density at the center of the window, p(x). As the window size approaches zero, or as the probability density approaches linearity, this effect will disappear, as reflected by Eq. 37. The expression for the variance (Eq. 38) shows that like histogramming, the variance of this estimate is proportional to the value of the probability density (standard deviation proportional to the square root of the probability density). Unlike histogramming, however, this probability density estimate is not piece- wise constant over fixed intervals (bins) and does not suffer from the sharp dis- continuities that histogramming produces at the boundaries of these intervals ('statistical fluxuations"). This estimator does, of course, suffer from statis- tical uncertainty as reflected by its variance (Eq. 38). However, the Rosenblatt estimator produces a relatively smooth probability density estimate which (at least in the limit of large sample size) can be shown to be more accurate than histogramming (see below for finite sample comparisons). 4 .1 .4 Parzen Estimators The Rosenblatt estimator is a special case of a general class of density esti- mators known as Parzen estimators or Parzen windows. 6)^ Let K(y) be a bounded absolutely integrable function such that

JR K(y)dy^ =^1 and^ lim^ IyK(y)I =0.^ (42) lyl -

Then the Parzen window estimators are defined as N pN(x) = h(N) E KC(hN))

The function K(y) is called the kernel or window function. The notation h(N) is used to explicitly indicate that the scale parameter for the kernel function de- pends upon the sample size, N. For the Rosenblatt estimator one has x-x i O(x ;xi ) K [h(N) 2N (44)

where f(x ;xi ) is defined in Eq. 34. Other possible kernels are : a) the double

exponential function a - '^ ;^ b) the standard normal (Gaussian) function ; c) the Cauchy function 1/(1+y 2 ); and d) sin 2y/y 2. Using procedures analogous to those for the Rosenblatt estimator, one can show that these estimators are biased, with the bias tending to zero quadratically as the scale parameter h(N) approaches zero. Also, the variance of the estimate tends to zero as 1/Nh(N) for increasing sample size, N. Thus, these estimators are consistent provided that h(N) -- 0 while Nh(N) - w.

1 .5 k-th Nearest Neighbor Density Estimation A disadvantage of the density estimators so far discussed is that there are few general guidelines for choosing the scale parameter (bin width for histo- gramming, window size, h(N), for Rosenblatt and Parzen estimators). For small variance (high statistical precision) the scale parameter should be as large as possible. For maximum sensitivity to^ p(x),^ rapid convergence as well as minimal bias (high systematic precision), the scale parameter should be as small as possible. The choice for a scale parameter is usually then a com- promise between these two competing effects. Ideally, the scale parameter should depend upon the data. That is, on the basis of a .density estimate the scale parameter can be changed and the density re-estimated. Although quite reasonable, this procedure invalidates the analyses that give rise to the statisti- cal results stated above concerning the bias, consistency and variance of these estimators, since the analyses all assume that h(N) is a deterministic function independent of the data. Thus, the statistical properties of such a procedure are largely unknown. Even further, the scale parameter should probably change for different values of the variable, x. In denser regions, one can take advantage of the large number of counts to increase systematic precision by using smaller

16 -

Data Analysis Techniques for High Energy Particle Physics, Lecture notes of Particle Physics

Related documents

Partial preview of the text

Download Data Analysis Techniques for High Energy Particle Physics and more Lecture notes Particle Physics in PDF only on Docsity!

DATA ANALYSIS TECHNIQUES FOR

HIGH ENERGY PARTICLE PHYSICS*

JEROME H. FRIEDMAN

STANFORD LINEAR ACCELERATOR CENTER

STANFORD UNIVERSITY

PREPARED FOR THE U. S. ATOMIC ENERGY

COMMISSION UNDER CONTRACT NO. AT(04-3)-

SLAC-

ABSTRACT

1. INTRODUCTION

N

VN = f p2 (x)dx -

R

M

VN = f P2(x)dx^ - fP^2(x)dx+E[^ f[P (X) -PN(x)J' 2 dh]

R R^ RL

M

E (a (i ) )2 -

f gi(x) gj(x) dx^ = 6 ij

C (N)^ = N E gi (x .) = N

choice of the orthogonal functions, {^ O^ i^ (x)}M^1.^ Also, it may happen that P N(.Y)