Understanding Gaussian Distributions & Error Propagation in Physics, Schemes and Mind Maps of Physics

An in-depth exploration of covariance and correlation in physics, focusing on the Gaussian distribution and its applications in understanding errors and uncertainties. Topics covered include the definition of covariance, the importance of Gaussian distributions, and the interpretation of multidimensional Gaussians. The document also discusses the relationship between free parameters and the covariance matrix, as well as the differences between maximum likelihood estimation and least squares methods.

Typology: Schemes and Mind Maps

2021/2022

Uploaded on 09/12/2022

christina
christina 🇺🇸

4.6

(23)

393 documents

1 / 55

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Physics 509 9
Covariance & Correlation
The covariance between two variables is defined by:
cov x , y= x
x
 y
y
=xy  x y
This is the most useful thing they never tell you in most lab
courses! Note that cov(x,x)=V(x).
The correlation coefficient is a unitless version of the same
thing:
=cov x , y
x
y
If x and y are independent variables (P(x,y) = P(x)P(y)), then
cov x , y=dx dy P x , y xy dx dy P x , y xdx dy P x , y y
=dx P xxdy P yy dx P xxdy P yy= 0
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30
pf31
pf32
pf33
pf34
pf35
pf36
pf37

Partial preview of the text

Download Understanding Gaussian Distributions & Error Propagation in Physics and more Schemes and Mind Maps Physics in PDF only on Docsity!

Physics 509

Covariance & Correlation

The covariance between two variables is defined by:

cov

x , y

=



x



x^

y



y^



=

 xy



^

x

 

y



This is the most useful thing they never tell you in most labcourses! Note that cov(x,x)=V(x).The correlation coefficient is a unitless version of the samething:



=

cov

x , y x

y

If x and y are independent variables

(P(x,y) = P(x)P(y))

, then

cov

x , y

=

dx dy P

x , y

xy

dx dy P

x , y

x

dx dy P

x , y

y

=

dx P

x

x

dy P

y

y

dx P

x

x

dy P

y

y

=

0

Physics 509

More on Covariance

Correlationcoefficients for somesimulated data sets.Note the bottomright---whileindependentvariables must havezero correlation, thereverse is not true!Correlation isimportant because itis part of the errorpropagationequation, as we'llsee.

Physics 509

Gaussian Distributions

By far the most useful distribution is the Gaussian (normal)

distribution:

P

x





2

e

1 2

x^



(^2)

68.27% of area within

±

1

σ

95.45% of area within

±

2

σ

99.73% of area within

±

3

σ

Mean =

μ

, Variance=

σ

2

Note that width scales with

σ

.

Area out on tails is important---uselookup tables or cumulativedistribution function.In plot to left, red area (>

σ

) is

2.3%. 90% of area within

±

σ

95% of area within

±

σ

99% of area within

±

σ

Physics 509

Why are Gaussian distributions so critical?



They occur very commonly---the reason is that the average ofseveral independent random variables often approaches aGaussian distribution in the limit of large N. 

Nice mathematical properties---infinitely differentiable,symmetric. Sum or difference of two Gaussian variables isalways itself Gaussian in its distribution.



Many complicated formulas simplify to linear algebra, oreven simpler, if all variables have Gaussian distributions.



Gaussian distribution is often used as a shorthand fordiscussing probabilities. A ì5 sigma resultî means a resultwith a chance probability that is the same as the tail area ofa unit Gaussian:

 ^5

dt P

 t

This way of speaking is used even for non-Gaussiandistributions!

Physics 509

Review of covariances of joint PDFs

Consider some multidimensional PDF

p(x

1

x

)n

. We define the

covariance between any two variables by:

cov

x

i^

, x

j



d

x p 

x

^

x

i^

x

i^

x

j

x

j

The set of all possible covariances defines a covariance matrix,

often denoted by V

. The diagonal elements of Vij

ij^

are the

variances of the individual variables, while the off-diagonalelements are related to the correlation coefficients:

V

ij

[^

2 1

12

1

2

1

n^

1

n

21

1

n^

2 2

2n

2

n

n

1

n^

n

2

n^

2 n

]

Physics 509

Properties of covariance matrices

Covariance matrices always: 

are symmetric and square 

are invertible (very important requirement!) The most common use of a covariance matrix is to invert it then

use it to calculate a

χ

2

2



i



j

y

i^

f^

x

i^

V

ij

1

y

j

f^

x

j

If the covariances are zero, then

V

=ij

σ ij

(^2) i , and this reduces to:

2



i

^

y

i^

f^

^

x

i^

2

2 i

W

a

r

n

in

g

:^

d

o

N

O

T

u

s

e

t

h

e

s

im

p

li

f

ie

d

f

o

r

m

u

la

i

f

d

a

t

a

p

o

in

t

s

a

r

e

c

o

r

re

la

t

e

d

Physics 509

Taylor Series expansion

Consider a Taylor series expansion of the logarithm of the

PDF around its maximum at (

x

0 ,y

0

log

P



x , y

=

P

 0

A

^

x

x

 0

B

^

y

y

 0

C



x

x

 0 2

D



y

y

 0 2

2

E

 x

x

 0

y

y

 0 ...

Since we are expanding around the peak, then the first

derivatives must equal zero, so A=B=0. The remainingterms can be written in matrix form:

log

P

 x , y



P

0



x ,



y



C 

E

E

D

 



x 

y



In order for (

x

0 ,y

0 ) to be a maximum of the PDF (and not a

minimum or saddle point), the above matrix must bepositive definite, and therefore invertible.

Physics 509

Taylor Series expansion

Let me now suggestively denote the inverse of the above

matrix by V

. It's a positive definite matrix with threeij

parameters. In fact, I might as well call these parameters σ

,x σ

, andy

Exponentiating, we see that around its peak the PDF can be

approximated by a multidimensional Gaussian. The fullformula, including normalization, is

log

P

 x , y



P

0



x ,



y



C 

E

E

D

 



x 

y



P



x , y



=

1

2



x^



y



1



2

exp

{

1

2



1



2 

[ ^

x

x

0



x^

(^2)  

^

y

y

0



y

(^2) 

2



^

x

x

0



x^

 

y

y

0



y

^ ] }

This is a good approximation as long as higher order terms inTaylor series are small.

Physics 509

P(x|y)^ P



x , y



=

1

2



x^



y



1



2

exp

{

1

2



1



2 

[ ^

x

x

0



x^

(^2)  

^

y

y

0



y

(^2) 

2



^

x

x

0



x^

 

y

y

0



y

^ ] }

Note: if you view y as a fixed parameter, then the PDF P(x|y) is aGaussian with width of:

x^



2

and a mean value of

x

0

^

x 

y



y

y

0

(It makes sense that the width of P(x|y) is always narrower thanthe width of the marginalized PDF P(x) (integrated over y). Ifyou know the actual value of y, you have additional informationand so a tighter constraint on x.

Physics 509

=2x

=1y

ρ

Red ellipse:contour withargument ofexponentialset to equal-1/2 Blue ellipse:contourcontaining68% of 2Dprobabilitycontent.

Physics 509

Probability content inside a contour ellipse^ For a 1D Gaussian exp(-x

2

σ

2 ), the ±

σ

limits occur when the

argument of the exponent equals -1/2. For a Gaussian there's a68% chance of the measurement falling within around the mean. But for a 2D Gaussian this is not the case. Easiest to see this for

the simple case of

σ

=x

σ

=1:y



dx dy

exp

[^

x

2

y

2

=]

r ^0 0

dr

exp

[^

r

2

= ]

Evaluating this integral and solving gives r

0 2 =2.3. So 68% of

probability content is contained within a radius of

σ

We call this the 2D contour. Note that it's bigger than the 1Dversion---if you pick points inside the 68% contour and plottheir x coordinates, they'll span a wider range than thosepicked from the 68% contour of the 1D marginalized PDF!

Physics 509

=2x

=1y

ρ

Red ellipse:contour withargument ofexponentialset to equal-1/2 Blue ellipse:contourcontaining68% ofprobabilitycontent.

Physics 509

Two marginalization procedures

Normal marginalization procedure: integrate over nuisance variables:

P

x



dy P

x , y

Alternate marginalization procedure: maximize the likelihood as a function of

the nuisance variables, and return the result:

P

x

max

y

P

x , y

(It is not necessarily the case that the resulting PDF is normalized.) I can prove for Gaussian distributions that these two marginalization

procedures are equivalent, but cannot prove it for the general case (In factthey give different results). Bayesians always follow the first prescription. Frequentists most often use

the second. Sometimes it will be computationally easier to apply one, sometimes the

other, even for PDFs that are approximately Gaussian.

Physics 509

Maximum likelihood estimators By far the most useful estimator is the maximum likelihoodmethod. Given your data set x

1

... x

N

and a set of unknown

parameters

α

, calculate the likelihood function

L
^

x

1

x

N



^ i

1 N

P

x

 i

It's more common (and easier) to calculate -ln

L

instead:

ln

L

x

1

x

N

^ i =

1 N

ln

P

x

 i

The maximum likelihood estimator is that value of

α

which

maximizes L as a function of

α

. It can be found by minimizing

-ln

L

over the unknown parameters.