Multivariate Data Display - Basic Statistics for Behavioral Sciences - Lecture Notes, Study notes of Statistics for Psychologists

Multivariate Data Display, Mean Vectors, Sample Mean Vector, Computation of a Mean Vector, Treaspose of the Data Matrix, Population Mean Vector, Covariance Matrices are learning points of this lecture.

Typology: Study notes

2011/2012

Uploaded on 11/21/2012

ashakiran
ashakiran 🇮🇳

4.5

(27)

261 documents

1 / 10

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Ch. 3: Multivariate Data Display
I. Mean vectors
A. For an n individual (subject) and p variable setting, we can
have n-observation vectors each of which has p-variables.
y =
ip
i
i
y
y
y
.
.
2
1
where i = 1, 2, . . n
B. The sample mean vector for all subjects for the p-variable
can be expressed as:
y
= (1/n)
=
n
ii
y
1
=
p
y
y
y
.
.
2
1
C. The data matrix for all n-individuals and p-variables can be
expressed as:
nXp
Y
=
'
'
2
'
1
.
.
n
y
y
y
=
npnn
p
p
yyy
yyy
yyy
...
...
...
.
.....
...
...
21
22221
1
1211
D. Computation of a mean vector from data matrix, Y.
y
= (1/n)Yj where
Y’ = treaspose of the data matrix Y,
j = unit vector with 1s (nX1).
Docsity.com
pf3
pf4
pf5
pf8
pf9
pfa

Partial preview of the text

Download Multivariate Data Display - Basic Statistics for Behavioral Sciences - Lecture Notes and more Study notes Statistics for Psychologists in PDF only on Docsity!

Ch. 3: Multivariate Data Display

I. Mean vectors

A. For an n individual (subject) and p variable setting, we can

have n-observation vectors each of which has p-variables.

y =

ip

i

i

y

y

y

2

1

where i = 1, 2,.. n

B. The sample mean vector for all subjects for the p-variable can be expressed as:

y = (1/n) ∑

=

n

i

yi 1

y p

y

y

2

1

C. The data matrix for all n-individuals and p-variables can be expressed as:

nXp

Y =

'

' 2

' 1

y n

y

y

n n np

p

p

y y y

y y y

y y y

1 2

21 22 2

11 12 1

D. Computation of a mean vector from data matrix, Y.

y = (1/n) Yj where

Y ’ = treaspose of the data matrix Y,

j = unit vector with 1s (nX1).

y = n

p p pn

n

n

y y y

y y y

y y y

1 2

21 22 2

11 12 1

=

=

=

n

i

ip

n

i

i

n

i

i

y

y

y

n

1

1

2

1

1

e.g. Y =

y =

E. Population mean vector (Expected value)

E( y ) = E

y p

y

y

2

1

2

1

E y p

E y

E y

μ p

2

1

= μ

and,

B. Matrix expression of variance

  1. From observation vector

S = ∑

=

n

i

yi y yi y n (^) 1

yy nyy n

∑ i i −

  1. From data matrix

S =

n

[ Y’Y – Y ’(

n

J ) Y ]

where,

=

n

i

yi yi 1

' = Y’Y ,

y = (1/n) Y’j = Y ’( j /n),

n (^) y = Y’j, (^) y ’ = ( Y ’( j /n)) = ( j ’/n) Y

∴n y y ’ = Y ’( jj ’/n) Y = Y ’( n

J ) Y

C. Population covariance matrix

  1. Σ = cov(y) =

p p pp

p

p

σ σ σ

σ σ σ

σ σ σ

1 2

21 22 2

11 12 1

= E[( yμ )( yμ )’] = E( yy ’) - μ μ

  1. proof: Σ = E[( yμ )( yμ )’]

=

  1. Σ = E( yy ’) - μ μ ’ and E( S ) = Σ

III. Correlation matrices

A. Sample correlation matrix, R

R = (rij) =

1 2

21 2

12 1

p p

p

p

r r

r r

r r

rij = i j

ij

ss

s

B. Computation of R and S

  1. Let

DS = Diag( s 11 , s (^) 22 ,.. s (^) pp )

= Daig(s 1 , s 2 ,.. sp)

s p

s

s

2

1

  1. Then,

R =

− 1 D (^) S S

− 1 D (^) S (c.f. rij = i j

ij

ss

s , from p. 8, DAD )

s p

s

s

2

1

p p pp

p

p

s s s

s s s

s s s

1 2

21 22 2

11 12 1

s p

s

s

2

1

And S = DSRDS. What if D = I? Then, S = R.

  1. Example,
  1. Mean vector

( pq ) X 1

m

x

y

q

p

x

x

x

y

y

y

2

1

2

1

  1. Covariance matrix

( pq ) X ( pq )

S

xy xx

yy yx

S S

S S

xqy xqy xqyp xqx xqx xqxq

xy xy x yp xx xx xxq

xy xy xyp xx xx xxq

ypy ypy ypyp ypx ypx ypxq

y y y y y yp y x yx y xq

yy yy yyp yx yx yxq

s s s s s s

s s s s s s

s s s s s s

s s s s s s

s s s s s s

s s s s s s

1 2 1 2

21 2 2 2 21 22 2

11 12 1 11 12 1

1 2 1 2

21 2 2 2 21 22 2

11 12 1 11 12 1

S xy =

' S yx

IV. Linear combination

A. Sample

  1. zi = a 1 y 1 + a 2 y 2 +.. + apyp = a’y

(c.f. y^ = b 1 x 1 + b 2 x 2 +.. + bpxp) where

ai = coefficient, yi = random variable.

a’y = [a 1 , a 2 ,.. ap]

y p

y

y

2

1

  1. If ai is the same vector of p-coefficients and applied to

different n subjects of y, then,

zi = a 1 yi1 + a 2 yi2 +.. + apyip = a’yi , i = 1, 2,.. n

  1. The mean of z ( z ) can be expressed as:

=

n

i

zi n

z 1

= ay ,

where

y = sample mean vector of y 1 , y 2 ,.. y n.

And the variance of z (sz

2 ) can be expressed as:

1

2

2

=

n

z z

s

n

i

i

z =^ a’Sa^ ≥^0

where,

S = sample covariance matrix for y (semi-positive

definite).

  1. If we have a second vector of coefficient b for y , such

that,

wi = b 1 y 1 + b 2 y 2 +.. + bpyp,

then, the sample covariance between z and w is, szw = a’Sb , and

correlation between z and w is,

rzw = ( ' )( ' )

(^2 2) aSa bSb

aSb

s s

s

z w

zw

  1. In general, a = a 1 , and b = a 2 , if A = 

' 2

' 1

a

a ,

then, z = 

a y

ay

' 2

' 1 = 

' 2

' 1

a

a y = Ay ,

y a

a

a y

a y

z

z z

' 2

' 1 ' 2

' 1

2

1 = A y , and

S z = 

2 21 2

12

2 1

zz z

z zz

s s

s s

2

' 1 2

' 2

2

' 1 1

' 1

aSa a Sa

a Sa aSa

' 2

' 1

a

a

S [ a 1 a 2 ] = ASA’

where

VI. Distance between vectors

A. Univariate distance

Squared standard distance

d

2

2

2 ( 1 2 )

or 2

2 ( )

y

y

B. Multivariate distance (Mahalanobis distance) d

2 = ( y 1 – y 2 )’ S

  • ( y 1 – y 2 ) sample distribution

_ _ D

2 = ( yμ )’ S

  • ( yμ ) sampling distribution

_ _ Δ

2 = ( yμ )’ Σ

  • ( yμ ) population mean distance

2 = ( μ 1μ 2 )’ Σ

  • ( μ 1μ 2 ) two population mean difference