




Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Define the new sample y1 = x11 + x12,y2 = x21 + x22, ..., yn = xn1 + xn2. Can we compute its sample mean and sample variance directly through x and Sx? Denote C ...
Typology: Exams
1 / 8
This page cannot be seen from the preview
Don't miss anything!





Recall that in 1-dimensional case, in a sample x 1 ,... , xn, we can define
¯x =
n
n ∑
i=
xi
as the (unbiased) sample mean
s
2 :=
n − 1
n ∑
i=
(xi − x¯)
2
p-dimensional case: Suppose we have p variates X 1 ,... , Xp. For the vector of variates
Xp
we have a p-variate sample with size n:
~x 1 ,... , ~xn ∈ R
p .
This sample of n observations give the following data matrix:
x 11 x 12... x 1 p
x 21 x 22... x 2 p
. . .
xn 1 xn 2... xnp
~x
> 1
~x
> 2 . . .
~x> n
Notice that here each column in the data matrix corresponds to a particular variate Xj.
Sample mean: For each variate Xj , define the sample mean:
x¯j =
n
∑^ n
i=
xij , j = 1,... , p.
Then the sample mean vector
~x :=
x ¯ 1
. . .
x¯p
1 n
∑n
i=
xi 1
1 n
∑n
i=
xip
n
∑^ n
i=
xi 1
. . .
xip
n
∑^ n
i=
~xi.
Sample covariance matrix: For each variate Xj , j = 1,... , p, define its sample variance as
sjj = s
2 j :=^
n − 1
∑^ n
i=
(xij − x¯j )
2 , j = 1,... , p
and sample covariance between Xj and Xk
sjk = skj :=
n − 1
∑^ n
i=
(xij − x¯j )(xik − x¯k), 1 ≤ k, j ≤ p, j 6 = k.
The sample covariance matrix is defined as
s 11 s 12... s 1 p
s 21 s 22... s 2 p
. . .
sp 1 sp 2... spp
Then
1 n− 1
n i=1(xi^1 −^ x¯^1 )
2
...
1 n− 1
n i=1(xi^1 −^ x¯^1 )(xip^ −^ ¯xp) . . .
1 n− 1
∑n
i=1(xip^ −^ x¯p)(xi^1 −^ x¯^1 )^...^
1 n− 1
∑n
i=1(xip^ −^ x¯p)
2
n − 1
∑^ n
i=
(xi 1 − x¯ 1 )
2
... (xi 1 − x¯ 1 )(xip − x¯p)
. . .
(xip − x¯p)(xi 1 − x¯ 1 )... (xip − x¯p)
2
n − 1
∑^ n
i=
xi 1 − x¯ 1
. . .
xip − x¯p
xi 1 − x¯ 1... xip − ¯xp
n − 1
n ∑
i=
~xi − ~x
~xi − ~x
2 Linear transformation of observations
Consider a sample of X~ =
Xp
with size^ n:
~x 1 ,... , ~xn.
The corresponding data matrix is represented as
x 11 x 12... x 1 p
x 21 x 22... x 2 p
. . .
xn 1 xn 2... xnp
~x
> 1
~x
> 2 . . .
~x
> n
For some C ∈ R
q×p and d~ ∈ R
q , consider the linear transformation
Yq
X~ + d.~
We have the partition of the sample mean and the sample covariance matrix as follows:
~x =
¯x 1
¯x 2
. . .
x¯q
x ¯q+
x ¯q+
. . .
¯xp
~x
(1)
~^ ¯x(2)
s 11... s 1 q s 1 ,q+1... s 1 ,p
. . .
sq 1... sqq sq,q+1... sq,p
sq+1, 1... sq+1,q sq+1,q+1... sq+1,p
. . .
sp 1... spq sp,q+1... sp,p
By definition, S 11 is the sample covariance of X~
(1) and S 22 is the sample covariance of X~
(2)
. Here S 12
is referred to as the sample cross covariance matrix between X~
(1) and X~
(2)
. In fact, we can derive the
following formula:
> 12 =^
n − 1
∑^ n
i=
~x
(2) i −^ ~x¯(2)
~x
(1) i −^ ~x¯(1)
4 Standardization and Sample Correlation Matrix
For the data matrix (1.1). The sample mean vector is denoted as ~x and the sample covariance is denoted
as S. In particular, for j = 1,... , p, let ¯xj be the sample mean of the j-th variable and
sjj be the sample
standard deviation.
For any entry xij for i = 1,... , n and j = 1,... , p, we get the standardized entry
zij =
xij − x¯j √ sjj
Then the data matrix X is standardized to
z 11 z 12... z 1 p
z 21 z 22... z 2 p
. . .
zn 1 zn 2... znp
~z
> 1 ~z 2 >
. . .
~z
> n
Denote by R the sample covariance for the sample z 1 ,... , zn. What is the connection between R and S?
The i-th row of Z can be written as
zi 1
zi 2
. . .
zip
(xi 1 − ¯x 1 )/
s 11
(xi 2 − ¯x 2 )/
s 22
. . .
(xip − ¯xp)/
spp
√^1 s 11
√^1 s 22
.. .
√^1 spp
xi 1 − ¯x 1
xi 2 − ¯x 2
. . .
xip − ¯xp
Let
− 1 (^2) =
√^1 s 11
√^1 s 22
.. .
√^1 spp
This transformation can be represented as
~zi = V
− 1 (^2) (~xi − ~x) = V
− 1 (^2) ~xi − V
− 1 (^2) ~x, i = 1,... , n.
This implies that the sample mean for the new data matrix is
~¯z = V −^
1 (^2) (~x¯ − ~x¯) = ~ 0 ,
By the formula for the sample covariance of linear combinations of variates, the sample covariance
matrix for the new data matrix Z is
−
1 (^2) S
−
1 2
√^1 s 11 √^1 s 22
.. .
√^1 spp
s 11 s 12... s 1 p
s 21 s 22... s 2 p
. . .
sp 1 sp 2... spp
√^1 s 11 √^1 s 22
.. .
√^1 spp
√s^12 s 11 s 22
s 1 p √ s 11 spp
√^ s^21 s 22 s 11
s 2 p √ s 22 spp
. ..
sp 1 √ spps 11
sp 2 √ spps 22
r 11 r 12... r 1 p
r 21 r 22... r 2 p
. . .
rp 1 rp 2... rpp
The matrix R is called the sample correlation matrix for the original data matrix X.
5 Mahalanobis distance and mean-centered ellipse
Recall that the sample covariance is
n − 1
∑^ n
i=
(~xi − ~x¯)(~xi − ~x¯)
> .
Is S always positive semidefinite? Consider the spectral decomposition
p ∑
j=
λj ~uj ~u
> j.
Then S~uj = λj ~uj , which implies that
~u
> j S~uj^ =^ ~u
> j (λj^ ~uj^ ) =^ λj^ ~u
> j ~uj^ =^ λj^.
On the other hand
~u
> j S~uj^ =^
n − 1
~u
> j
n ∑
i=
(~xi − ~x¯)(~xi − ~x¯)
>
~uj
n − 1
∑^ n
i=
~u
> j (~xi^ −^ ~x¯)(~x i −^ ~x¯)>~u j
n − 1
∑^ n
i=
|~u
> j (~xi^ −^ ~¯x)|^2 ≥ 0.
This implies that all eigenvalues of S are nonnegative, so S is positive semidefinite.
In this course, we always assume n > p and S is positive definite, which also implies that the inverse
sample covariance matrix S
− 1 is also positive definite.
6 Examples
Consider a 2-variate data matrix
x 11 x 12
x 21 x 22
. ..
xn 1 xn 2
with sample mean vector ~x and sample covariance matrix S~x.
Define the new sample
y 1 = x 11 + x 12 , y 2 = x 21 + x 22 , ..., yn = xn 1 + xn 2.
Can we compute its sample mean and sample variance directly through ~x and S~x?
Denote C = [1, 1]. Then
yi = xi 1 + xi 2 = [1, 1]
xi 1
xi 2
= C~xi.
The sample mean of y 1 ,... , yn can be represented as
¯y =
n
[(x 11 + x 12 ) +... + (xn 1 + xn 2 )]
n
[x 11 +... + xn 1 ] +
n
[x 12 +... + xn 2 ]
= x 1 + x 2
= C~x.
Represent the sample variance of y 1 ,... , yn by s
2 y. Then
(n − 1)s
2 y =
∑^ n
i=
(yi − y)
∑^ n
i=
((xi 1 + xi 2 ) − (x 1 + x 2 ))
2
n ∑
i=
((xi 1 − x 1 ) + (xi 2 − x 2 ))
2
∑^ n
i=
(xi 1 − x 1 )
2
2
∑^ n
i=
(xi 1 − x 1 )
2
∑^ n
i=
(xi 1 − x 1 )(xi 2 − x 2 ) +
∑^ n
i=
(xi 2 − x 2 )
2
= (n − 1)s 11 + 2(n − 1)s 12 + (n − 1)s 22.
Then
s
2 y =^ s^11 + 2s^12 +^ s^22 =^ s^11 +^ s^12 +^ s^21 +^ s^22
s 11 s 12
s 21 s 22
>
Suppose X ∈ R
n× 4 is a data matrix for the variables X~ =
, with the following sample covariance
Sx =
What is the sample cross-covariance matrix between
and
Solution Since
we know it sample covariance matrix is
Sy = CSxC
>
From the partition
we have the partition
Sy =
Then sample cross-covariance matrix between
and
is
. This result can be verified
entrywise.