















































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
An in-depth exploration of covariance and correlation in physics, focusing on the Gaussian distribution and its applications in understanding errors and uncertainties. Topics covered include the definition of covariance, the importance of Gaussian distributions, and the interpretation of multidimensional Gaussians. The document also discusses the relationship between free parameters and the covariance matrix, as well as the differences between maximum likelihood estimation and least squares methods.
Typology: Schemes and Mind Maps
1 / 55
This page cannot be seen from the preview
Don't miss anything!
















































Physics 509
The covariance between two variables is defined by:
cov
x , y
=
x
x^
y
y^
=
xy
^
x
y
This is the most useful thing they never tell you in most labcourses! Note that cov(x,x)=V(x).The correlation coefficient is a unitless version of the samething:
=
cov
x , y x
y
If x and y are independent variables
(P(x,y) = P(x)P(y))
, then
cov
x , y
=
dx dy P
x , y
xy
dx dy P
x , y
x
dx dy P
x , y
y
=
dx P
x
x
dy P
y
y
dx P
x
x
dy P
y
y
=
0
Physics 509
Physics 509
By far the most useful distribution is the Gaussian (normal)
distribution:
x
2
e
1 2
x^
(^2)
68.27% of area within
±
1
σ
95.45% of area within
±
2
σ
99.73% of area within
±
3
σ
Mean =
μ
, Variance=
σ
2
Note that width scales with
σ
.
Area out on tails is important---uselookup tables or cumulativedistribution function.In plot to left, red area (>
σ
) is
2.3%. 90% of area within
±
σ
95% of area within
±
σ
99% of area within
±
σ
Physics 509
They occur very commonly---the reason is that the average ofseveral independent random variables often approaches aGaussian distribution in the limit of large N.
^5
dt P
t
Physics 509
Consider some multidimensional PDF
p(x
1
x
)n
. We define the
covariance between any two variables by:
cov
x
i^
, x
j
d
x p
x
x
i^
x
i^
x
j
x
j
The set of all possible covariances defines a covariance matrix,
often denoted by V
. The diagonal elements of Vij
ij^
are the
variances of the individual variables, while the off-diagonalelements are related to the correlation coefficients:
ij
2 1
12
1
2
1
n^
1
n
21
1
n^
2 2
2n
2
n
n
1
n^
n
2
n^
2 n
Physics 509
Covariance matrices always:
are symmetric and square
are invertible (very important requirement!) The most common use of a covariance matrix is to invert it then
use it to calculate a
χ
2
2
i
j
y
i^
f^
x
i^
ij
1
y
j
f^
x
j
If the covariances are zero, then
=ij
(^2) i , and this reduces to:
2
i
y
i^
f^
x
i^
2
2 i
Physics 509
Consider a Taylor series expansion of the logarithm of the
PDF around its maximum at (
x
0 ,y
0
log
P
x , y
=
P
0
A
^
x
x
0
B
^
y
y
0
C
x
x
0 2
D
y
y
0 2
2
E
x
x
0
y
y
0 ...
Since we are expanding around the peak, then the first
derivatives must equal zero, so A=B=0. The remainingterms can be written in matrix form:
log
P
x , y
P
0
x ,
y
C
E
E
D
x
y
In order for (
x
0 ,y
0 ) to be a maximum of the PDF (and not a
minimum or saddle point), the above matrix must bepositive definite, and therefore invertible.
Physics 509
Let me now suggestively denote the inverse of the above
matrix by V
. It's a positive definite matrix with threeij
parameters. In fact, I might as well call these parameters σ
,x σ
, andy
Exponentiating, we see that around its peak the PDF can be
approximated by a multidimensional Gaussian. The fullformula, including normalization, is
log
P
x , y
P
0
x ,
y
C
E
E
D
x
y
P
x , y
=
1
2
x^
y
1
2
exp
{
1
2
1
2
[ ^
x
x
0
x^
(^2)
^
y
y
0
y
(^2)
2
^
x
x
0
x^
y
y
0
y
^ ] }
This is a good approximation as long as higher order terms inTaylor series are small.
Physics 509
x , y
=
1
2
x^
y
1
2
exp
{
1
2
1
2
[ ^
x
x
0
x^
(^2)
^
y
y
0
y
(^2)
2
^
x
x
0
x^
y
y
0
y
^ ] }
Note: if you view y as a fixed parameter, then the PDF P(x|y) is aGaussian with width of:
x^
2
and a mean value of
x
0
^
x
y
y
y
0
(It makes sense that the width of P(x|y) is always narrower thanthe width of the marginalized PDF P(x) (integrated over y). Ifyou know the actual value of y, you have additional informationand so a tighter constraint on x.
Physics 509
ρ
Physics 509
2
σ
2 ), the ±
σ
limits occur when the
argument of the exponent equals -1/2. For a Gaussian there's a68% chance of the measurement falling within around the mean. But for a 2D Gaussian this is not the case. Easiest to see this for
the simple case of
σ
=x
σ
=1:y
dx dy
exp
[^
x
2
y
2
=]
r ^0 0
dr
exp
[^
r
2
= ]
Evaluating this integral and solving gives r
0 2 =2.3. So 68% of
probability content is contained within a radius of
σ
We call this the 2D contour. Note that it's bigger than the 1Dversion---if you pick points inside the 68% contour and plottheir x coordinates, they'll span a wider range than thosepicked from the 68% contour of the 1D marginalized PDF!
Physics 509
ρ
Physics 509
Normal marginalization procedure: integrate over nuisance variables:
x
dy P
x , y
Alternate marginalization procedure: maximize the likelihood as a function of
the nuisance variables, and return the result:
x
max
y
x , y
(It is not necessarily the case that the resulting PDF is normalized.) I can prove for Gaussian distributions that these two marginalization
procedures are equivalent, but cannot prove it for the general case (In factthey give different results). Bayesians always follow the first prescription. Frequentists most often use
the second. Sometimes it will be computationally easier to apply one, sometimes the
other, even for PDFs that are approximately Gaussian.
Physics 509
1
... x
N
and a set of unknown
parameters
α
, calculate the likelihood function
x
1
x
N
1 N
x
i
It's more common (and easier) to calculate -ln
instead:
ln
x
1
x
N
^ i =
1 N
ln
x
i
The maximum likelihood estimator is that value of
α
which
maximizes L as a function of
α
. It can be found by minimizing
-ln
over the unknown parameters.