Sample Mean Vector and Sample Covariance Matrix, Exams of Signals and Systems Theory

Define the new sample y1 = x11 + x12,y2 = x21 + x22, ..., yn = xn1 + xn2. Can we compute its sample mean and sample variance directly through x and Sx? Denote C ...

Typology: Exams

2022/2023

Uploaded on 02/28/2023

anandit
anandit 🇺🇸

4.8

(19)

255 documents

1 / 8

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
STA135 Lecture 2: Sample Mean Vector and Sample Covariance Matrix
Xiaodong Li UC Davis
1 Sample mean and sample covariance
Recall that in 1-dimensional case, in a sample x1, . . . , xn, we can define
¯x=1
n
n
X
i=1
xi
as the (unbiased) sample mean
s2:= 1
n1
n
X
i=1
(xi¯x)2
p-dimensional case: Suppose we have pvariates X1, . . . , Xp. For the vector of variates
~
X=
X1
.
.
.
Xp
,
we have a p-variate sample with size n:
~x1, . . . , ~xnRp.
This sample of nobservations give the following data matrix:
X=
x11 x12 . . . x1p
x21 x22 . . . x2p
.
.
..
.
.....
.
.
xn1xn2. . . xnp
=
~x>
1
~x>
2
.
.
.
~x>
n
.(1.1)
Notice that here each column in the data matrix corresponds to a particular variate Xj.
Sample mean: For each variate Xj, define the sample mean:
¯xj=1
n
n
X
i=1
xij, j = 1, . . . , p.
Then the sample mean vector
~x :=
¯x1
.
.
.
¯xp
=
1
n
n
P
i=1
xi1
.
.
.
1
n
n
P
i=1
xip
=1
n
n
X
i=1
xi1
.
.
.
xip
=1
n
n
X
i=1
~xi.
1
pf3
pf4
pf5
pf8

Partial preview of the text

Download Sample Mean Vector and Sample Covariance Matrix and more Exams Signals and Systems Theory in PDF only on Docsity!

STA135 Lecture 2: Sample Mean Vector and Sample Covariance Matrix

Xiaodong Li UC Davis

1 Sample mean and sample covariance

Recall that in 1-dimensional case, in a sample x 1 ,... , xn, we can define

¯x =

n

n ∑

i=

xi

as the (unbiased) sample mean

s

2 :=

n − 1

n ∑

i=

(xi − x¯)

2

p-dimensional case: Suppose we have p variates X 1 ,... , Xp. For the vector of variates

X^ ~ =

X 1

Xp

we have a p-variate sample with size n:

~x 1 ,... , ~xn ∈ R

p .

This sample of n observations give the following data matrix:

X =

x 11 x 12... x 1 p

x 21 x 22... x 2 p

. . .

xn 1 xn 2... xnp

~x

> 1

~x

> 2 . . .

~x> n

Notice that here each column in the data matrix corresponds to a particular variate Xj.

Sample mean: For each variate Xj , define the sample mean:

x¯j =

n

∑^ n

i=

xij , j = 1,... , p.

Then the sample mean vector

~x :=

x ¯ 1

. . .

x¯p

1 n

∑n

i=

xi 1

1 n

∑n

i=

xip

n

∑^ n

i=

xi 1

. . .

xip

n

∑^ n

i=

~xi.

Sample covariance matrix: For each variate Xj , j = 1,... , p, define its sample variance as

sjj = s

2 j :=^

n − 1

∑^ n

i=

(xij − x¯j )

2 , j = 1,... , p

and sample covariance between Xj and Xk

sjk = skj :=

n − 1

∑^ n

i=

(xij − x¯j )(xik − x¯k), 1 ≤ k, j ≤ p, j 6 = k.

The sample covariance matrix is defined as

S =

s 11 s 12... s 1 p

s 21 s 22... s 2 p

. . .

sp 1 sp 2... spp

Then

S =

1 n− 1

n i=1(xi^1 −^ x¯^1 )

2

...

1 n− 1

n i=1(xi^1 −^ x¯^1 )(xip^ −^ ¯xp) . . .

1 n− 1

∑n

i=1(xip^ −^ x¯p)(xi^1 −^ x¯^1 )^...^

1 n− 1

∑n

i=1(xip^ −^ x¯p)

2

n − 1

∑^ n

i=

(xi 1 − x¯ 1 )

2

... (xi 1 − x¯ 1 )(xip − x¯p)

. . .

(xip − x¯p)(xi 1 − x¯ 1 )... (xip − x¯p)

2

n − 1

∑^ n

i=

xi 1 − x¯ 1

. . .

xip − x¯p

[

xi 1 − x¯ 1... xip − ¯xp

]

n − 1

n ∑

i=

~xi − ~x

~xi − ~x

2 Linear transformation of observations

Consider a sample of X~ =

X 1

Xp

 with size^ n:

~x 1 ,... , ~xn.

The corresponding data matrix is represented as

X =

x 11 x 12... x 1 p

x 21 x 22... x 2 p

. . .

xn 1 xn 2... xnp

~x

> 1

~x

> 2 . . .

~x

> n

For some C ∈ R

q×p and d~ ∈ R

q , consider the linear transformation

Y^ ~ =

Y 1

Yq

 =^ C^

X~ + d.~

We have the partition of the sample mean and the sample covariance matrix as follows:

~x =

¯x 1

¯x 2

. . .

x¯q

x ¯q+

x ¯q+

. . .

¯xp

[

~x

(1)

~^ ¯x(2)

]

, S =

s 11... s 1 q s 1 ,q+1... s 1 ,p

. . .

sq 1... sqq sq,q+1... sq,p

sq+1, 1... sq+1,q sq+1,q+1... sq+1,p

. . .

sp 1... spq sp,q+1... sp,p

[

S 11 S 12

S 21 S 22

]

By definition, S 11 is the sample covariance of X~

(1) and S 22 is the sample covariance of X~

(2)

. Here S 12

is referred to as the sample cross covariance matrix between X~

(1) and X~

(2)

. In fact, we can derive the

following formula:

S 21 = S

> 12 =^

n − 1

∑^ n

i=

~x

(2) i −^ ~x¯(2)

~x

(1) i −^ ~x¯(1)

4 Standardization and Sample Correlation Matrix

For the data matrix (1.1). The sample mean vector is denoted as ~x and the sample covariance is denoted

as S. In particular, for j = 1,... , p, let ¯xj be the sample mean of the j-th variable and

sjj be the sample

standard deviation.

For any entry xij for i = 1,... , n and j = 1,... , p, we get the standardized entry

zij =

xij − x¯j √ sjj

Then the data matrix X is standardized to

Z =

z 11 z 12... z 1 p

z 21 z 22... z 2 p

. . .

zn 1 zn 2... znp

~z

> 1 ~z 2 >

. . .

~z

> n

Denote by R the sample covariance for the sample z 1 ,... , zn. What is the connection between R and S?

The i-th row of Z can be written as

zi 1

zi 2

. . .

zip

(xi 1 − ¯x 1 )/

s 11

(xi 2 − ¯x 2 )/

s 22

. . .

(xip − ¯xp)/

spp

√^1 s 11

√^1 s 22

.. .

√^1 spp

xi 1 − ¯x 1

xi 2 − ¯x 2

. . .

xip − ¯xp

Let

V

− 1 (^2) =

√^1 s 11

√^1 s 22

.. .

√^1 spp

This transformation can be represented as

~zi = V

− 1 (^2) (~xi − ~x) = V

− 1 (^2) ~xi − V

− 1 (^2) ~x, i = 1,... , n.

This implies that the sample mean for the new data matrix is

~¯z = V −^

1 (^2) (~x¯ − ~x¯) = ~ 0 ,

By the formula for the sample covariance of linear combinations of variates, the sample covariance

matrix for the new data matrix Z is

R = V

1 (^2) S

V

1 2

√^1 s 11 √^1 s 22

.. .

√^1 spp

s 11 s 12... s 1 p

s 21 s 22... s 2 p

. . .

sp 1 sp 2... spp

√^1 s 11 √^1 s 22

.. .

√^1 spp

√s^12 s 11 s 22

s 1 p √ s 11 spp

√^ s^21 s 22 s 11

s 2 p √ s 22 spp

. ..

sp 1 √ spps 11

sp 2 √ spps 22

r 11 r 12... r 1 p

r 21 r 22... r 2 p

. . .

rp 1 rp 2... rpp

The matrix R is called the sample correlation matrix for the original data matrix X.

5 Mahalanobis distance and mean-centered ellipse

Sample covariance is p.s.d.

Recall that the sample covariance is

S =

n − 1

∑^ n

i=

(~xi − ~x¯)(~xi − ~x¯)

> .

Is S always positive semidefinite? Consider the spectral decomposition

S =

p ∑

j=

λj ~uj ~u

> j.

Then S~uj = λj ~uj , which implies that

~u

> j S~uj^ =^ ~u

> j (λj^ ~uj^ ) =^ λj^ ~u

> j ~uj^ =^ λj^.

On the other hand

~u

> j S~uj^ =^

n − 1

~u

> j

n ∑

i=

(~xi − ~x¯)(~xi − ~x¯)

>

~uj

n − 1

∑^ n

i=

~u

> j (~xi^ −^ ~x¯)(~x i −^ ~x¯)>~u j

n − 1

∑^ n

i=

|~u

> j (~xi^ −^ ~¯x)|^2 ≥ 0.

This implies that all eigenvalues of S are nonnegative, so S is positive semidefinite.

In this course, we always assume n > p and S is positive definite, which also implies that the inverse

sample covariance matrix S

− 1 is also positive definite.

6 Examples

Example 1

Consider a 2-variate data matrix

X =

x 11 x 12

x 21 x 22

. ..

xn 1 xn 2

with sample mean vector ~x and sample covariance matrix S~x.

Define the new sample

y 1 = x 11 + x 12 , y 2 = x 21 + x 22 , ..., yn = xn 1 + xn 2.

Can we compute its sample mean and sample variance directly through ~x and S~x?

Denote C = [1, 1]. Then

yi = xi 1 + xi 2 = [1, 1]

[

xi 1

xi 2

]

= C~xi.

The sample mean of y 1 ,... , yn can be represented as

¯y =

n

[(x 11 + x 12 ) +... + (xn 1 + xn 2 )]

n

[x 11 +... + xn 1 ] +

n

[x 12 +... + xn 2 ]

= x 1 + x 2

= C~x.

Represent the sample variance of y 1 ,... , yn by s

2 y. Then

(n − 1)s

2 y =

∑^ n

i=

(yi − y)

2

∑^ n

i=

((xi 1 + xi 2 ) − (x 1 + x 2 ))

2

n ∑

i=

((xi 1 − x 1 ) + (xi 2 − x 2 ))

2

∑^ n

i=

(xi 1 − x 1 )

2

  • 2(xi 1 − x 1 )(xi 2 − x 2 ) + (xi 2 − x 2 )

2

∑^ n

i=

(xi 1 − x 1 )

2

  • 2

∑^ n

i=

(xi 1 − x 1 )(xi 2 − x 2 ) +

∑^ n

i=

(xi 2 − x 2 )

2

= (n − 1)s 11 + 2(n − 1)s 12 + (n − 1)s 22.

Then

s

2 y =^ s^11 + 2s^12 +^ s^22 =^ s^11 +^ s^12 +^ s^21 +^ s^22

= [1, 1]

[

s 11 s 12

s 21 s 22

] [

]

= CSC

>

Example 2

Suppose X ∈ R

n× 4 is a data matrix for the variables X~ =

X 1

X 2

X 3

X 4

, with the following sample covariance

Sx =

What is the sample cross-covariance matrix between

[

X 1

X 3

]

and

[

X 2

X 4

]

Solution Since

Y^ ~ :=

X 1

X 3

X 2

X 4

X 1

X 2

X 3

X 4

:= C X,~

we know it sample covariance matrix is

Sy = CSxC

>

From the partition

Y^ ~ =

X 1

X 3

X 2

X 4

we have the partition

Sy =

Then sample cross-covariance matrix between

[

X 1

X 3

]

and

[

X 2

X 4

]

is

[

]

. This result can be verified

entrywise.