Multivariate Analysis - Lecture Notes | STAT 8320, Exams of Statistics

Material Type: Exam; Class: Data Analysis II; Subject: Statistics; University: University of Missouri - Columbia; Term: Unknown 1989;

Typology: Exams

Pre 2010

Uploaded on 08/19/2009

koofers-user-vpn
koofers-user-vpn 🇺🇸

9 documents

1 / 33

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Multivariate Analysis
Introduction:
Consider an experiment in swine breeding where we will
measure several traits:
Size of litter.
Weight of largest piglet.
Weight of smallest piglet.
Number of days to weaning.
Mother’s weight at piglet’s birth.
Mother’s weight at piglet’s weaning.
Suppose that we have three genetically different types
of pigs and we will measure the above variables on each
of n= 30 mother pigs from each breed.
Question: How can the joint variability of the p= 6
traits be described?
MANOVA:
Are the sets of means for the p= 6 traits the same
for all varieties?
Which varieties are different or better?
Do varieties differ for all traits or only for some
traits?
A.1
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21

Partial preview of the text

Download Multivariate Analysis - Lecture Notes | STAT 8320 and more Exams Statistics in PDF only on Docsity!

Multivariate Analysis

Introduction:

Consider an experiment in swine breeding where we will measure several traits:

  • Size of litter.
  • Weight of largest piglet.
  • Weight of smallest piglet.
  • Number of days to weaning.
  • Mother’s weight at piglet’s birth.
  • Mother’s weight at piglet’s weaning.

Suppose that we have three genetically different types of pigs and we will measure the above variables on each of n = 30 mother pigs from each breed.

Question: How can the joint variability of the p = 6 traits be described?

MANOVA:

  • Are the sets of means for the p = 6 traits the same for all varieties?
  • Which varieties are different or better?
  • Do varieties differ for all traits or only for some traits?

PRINCIPAL COMPONENTS:

  • Which linear combination of the measured traits has the largest variance?
  • Which linear combination has the smallest variance?
  • Can most of the variation in the p = 6 traits be ac- counted for by the variation in 2 linear combinations of the traits? (Data reduction and description)

FACTOR ANALYSIS:

  • Can correlation among traits be attributed to vari- ation in one or more unobservable factors?

DISCRIMINANT ANALYSIS:

  • Can you classify an observation as belonging to one of the several possible populations?

CLUSTER ANALYSIS:

  • If you did not know the varieties, could you group the data into homogeneous classes?

Properties of Covariance Matrices

  1. Symmetric, Σ′^ = Σ.
  2. Non-negative definite, a′Σa ≥ 0 for any a ∈ <p. This is equivalent to the statement that the eigenvalues of Σ, λ 1 ≥ λ 2 ≥ · · · ≥ λp ≥ 0.
  3. |Σ| = λ 1 λ 2 · · · λp ≥ 0 (generalized variance).
  4. trace(Σ) = tr(Σ) = λ 1 + λ 2 + · · · + λp = σ 11 + σ 22 + · · · + σpp = sum of variances (total variance).

Note: We will usually require that Σ be positive definite. This implies that all eigenvalues are positive, and Σ has an inverse Σ−^1 , such that Σ−^1 Σ = Ip×p = ΣΣ−^1.

Correlations:

Define the correlation ρij and the correlation matrix by

ρij =

σij √σ iiσjj

and R =

ρ 11 ρ 12 · · · ρ 1 p ρ 21 ρ 22 · · · ρ 2 p ... ...... ... ρp 1 ρp 2 · · · ρpp

Notice that we can also write the correlation matrix as R = [diag(Σ)]−^1 /^2 Σ[diag(Σ)]−^1 /^2 , where diag(Σ) is just the matrix which has the σii’s on the diagonal and 0’s elsewhere. Additionally, the square root of a symmetric matrix A, denoted A^1 /^2 , is a symmetric matrix such that A = A^1 /^2 A^1 /^2.

Some useful equalities:

Let x and y be random vectors with means μx and μy and variance-covariance matrices Σx and Σy. Let A and B be matrices of constants and c and d be vectors of constants. Then

  • E(Ay + c) = Aμy + c.
  • var(Ay + c) = Avar(y)A′^ = AΣyA′.
  • cov(Ay + c, By + d) = AΣyB′.
  • E(Ay + Bx + c) = Aμy + Bμx + c.
  • var(Ay + Bx + c) = AΣyA′^ + BΣxB′^ + AΣyxB′^ + BΣ′ yxA′.

Let’s show the third of these below:

  • Individual components y 1 , · · · , yp are all normally dis- tributed yi ∼ N 1 (μi, σii). This follows immediately from the third bullet point. (Why?)
  • The conditional distribution of y 1 given y 2 is also normal.

y 1 |y 2 ∼ Np 1 (μ 1 + Σ 12 Σ− 221 (y 2 − μ 2 ), Σ 11 − Σ 12 Σ− 221 Σ 21 ).

Note: there are analogous results for y 2 |y 1. Ad- ditionally, y 1 and y 2 are independently distributed only if Σ 12 = 0.

  • If y ∼ N(μ, Σ) and Σ is positive definite, then

(y − μ)′Σ−^1 (y − μ) ∼ χ^2 (p).

We will call this quantity the squared Mahalanobis distance between y and μ.

  • If yi are independent N(μi, Σi) rv’s, then for fixed matrices Ai(m×p),

∑^ k

i=

Aiyi ∼ Nm(

∑^ k

i=

Aiμi,

∑^ k

i=

AiΣiA′ i)

Suppose that ( y 1 y 2

∼ N 2

If y 2 = 3, what is the conditional distribution of y 1?

Sampling from Multivariate Normal Populations

Suppose we have a random sample of size n, y 1 , · · · , yn

from Np(μ, Σ). Then,

  • Since ∑ y 1 , · · · , yn are iid, their sample mean, ¯y = n i=1 yi/n^ ∼^ Np(μ,^ Σ/n).^ That is, ¯y^ is an unbiased estimator of μ.
  • The p × p sample variance-covariance matrix, S, is

S =

n − 1

∑^ n

i=

(yi − ¯y)(yi − ¯y)′^ =

n − 1

{ (^) n ∑

i=

yiy′ i − n¯y¯y′

Note that S is symmetric and contains p(p + 1)/ 2 different random variables. Further, S is an unbi- ased estimator of Σ.

  • (n − 1)S ∼ Wp(n − 1 , Σ), a Wishart distribution with n − 1 degrees of freedom and expectation (n − 1)Σ. The Wishart distribution is just a multivariate ex- tension of the Chi-squared distribution.
  • y¯ and S are independent.
  • y¯ and S are sufficient statistics. (All of the infor- mation in the data about μ and Σ is contained in ¯y and S, regardless of sample size).

Maximum Likelihood Estimation for MVN

Suppose that we have an independent random sample y 1 , · · · , yn ∼ Np(μ, Σ). The we can write the likelihood function for the data as

L(μ, Σ) =

∏^ n

j=

(2π)p/^2 |Σ|^1 /^2

e−

(^12) (yj −μ)′Σ− (^1) (yj −μ)

(2π)np/^2 |Σ|n/^2

e−

12 ∑n j=1(yj^ −μ)

′Σ− (^1) (yj −μ) .

Then, we can write the m.l.e’s as

ˆμ = ¯y and ˆΣ =

n − 1 n

S.

We can show this by taking the log of the likelihood function and taking derivatives with respect to μ and Σ.

Properties of MLEs

  • INVARIANCE: If ˆθ is the mle of θ, then the mle of h(θ) is h(ˆθ), for any function h(·).
  • CONSISTENCY: mle’s are consistent estimators, but they are usually biased.
  • EFFICIENCY: mle’s are efficient estimators (no other estimator has a smaller variance for large samples).

Properties of MLEs, cont.

  • ASYMPTOTIC NORMALITY: Suppose that ˆθn is the mle for θ based upon n independent observa- tions. Then ˆθn ∼˙N(θ, H−^1 ). - H is called the Fisher Information Matrix. It contains the expected values of the second par- tial derivatives of the log-likelihood function. The (i, j)th element of H is

−E

∂^2 `(θ) ∂θi∂θj

  • We can estimate H be finding the form deter- mined above, and evaluating it at θ = ˆθn.
  • LIKELIHOOD RATIO TESTING: For some null hypothesis, H 0 , we can form a likelihood ratio test.
  • The statistic used is

maxH 0 (μ, Σ|Y ) max(μ, Σ|Y )

  • For large n, −2 log Λ ∼ χ^2 (ν) where ν is the num- ber of parameters in the unrestricted space - the number of parameters under H 0.
  • We reject H 0 if Λ is too small. Why?

Matrix Differentiation, Cont.

  • If X is symmetric and A is a matrix which does not depend upon X, then

∂trXA ∂X

= A + A′^ − Diag(A).

  • If X is symmetric and we let Jij be a matrix which has a 1 in the (i, j)th position and 0s elsewhere, then

∂X−^1

∂xij

−X−^1 JiiX−^1 , i = j −X−^1 (Jij + Jji)X−^1 , i 6 = j

Use these rules to find the derivative ∂x′Aa/∂x.

Now, find the derivative ∂trX/∂X.

Example:

Suppose that we collect the following sets of two num- bers based upon some process: (18. 4 , 25 .4), (20. 8 , 22 .1), (21. 8 , 27 .8), (19. 3 , 23 .7), (18. 9 , 27 .2), (19. 0 , 26 .6), (21. 7 , 29 .2), (22. 7 , 24 .2), (20. 6 , 24 .8), (18. 0 , 21 .9), (18. 4 , 26 .7), and (22. 6 , 21 .8). Find ¯y.

Next, find S. For simplicity, we will just find the co- variance here in class. (So that you can check them, s 11 = 2.96 and s 22 = 6.04.)

What is the mle of Σ? of μ?

Example, cont.

Finally, suppose that we wish to test the null hypothesis that μ 1 = μ 2 = 22.5. NOTE: Under restrictions, the mle of the mean and variance-covariance matrix may not have the same forms that we have discussed previously. First, what will be the value of ν?

Next, suppose that the mles under the restriction are σˆ 11 = 8.82, σˆ 12 = − 6 .73, and σˆ 22 = 13.51. What is the form of the numerator for the likelihood ratio test?

What is the form of the denominator for this test?

Note: it would not be appropriate to assume that 12 observations constitutes a large sample. This is for demonstration purposes only!

Tests of Multivariate Normality

We will discuss three different options for assessing mul- tivariate normality. When analyzing data we may wish to look at one or more of these possible methods.

  • Check univariate normality for each trait seperately.
    • This can be done using methods from Data Analysis I; for example, Shapiro-Wilks test.
    • Notice that marginal normality of all traits does not imply joint MVN.
  • Mardia’s tests for Multivariate normality.
    • Multivariate skewness can be defined as

β 1 ,p = E

(y − μ)′Σ−^1 (x − μ)

where x and y are independent, but have the same distribution.

  • Multivariate kurtosis is defined as

β 2 ,p = E

(y − μ)′Σ−^1 (y − μ)

  • For MVN distribution, we have that β 1 ,p = 0 and β 2 ,p = p(p + 2).

Tests of MVN, Cont.

  • Chi-Square Q-Q Plot
    • Let yi, i = 1, · · · , n be a random sample from Np(μ, Σ).
    • Then zi = Σ−^1 /^2 (yi − μ) i = 1 , · · · , n are iid Np(0, I). Thus, δ i^2 = z′ izi i = 1, · · · , n follows a χ^2 p distribution.
    • We can plot the ordered d^2 i values against the quantiles of the χ^2 p distribution. When normality holds, the plot should approximately resemble a straight line passing through the origin at a 45◦ angle.

0 2 4 6 8 10 12

0

2

4

6

8

Chi−Square Quantiles

Mahalanobis Distance

  • If the data are not normal, we can (1) ignore this, (2) use nonparametric methods, (3) use models based upon an appropriate distribution, or (4) try performing a transformation.

Inference about a Mean Vector:

Recall that for a univariate normal distribution, we could test the hypothesis H 0 : μ = μ 0 by using the test statistic

T =

y − μ 0 s/

n

which has a tn− 1 distribution under the null hypothesis. We would reject this null hypothesis if |T | is large relative to t(1−α/ 2 ,n−1) because this indicates that seeing a value as large as ours is rare if the null is true.

This is equivalent to rejecting the null hypothesis if

T 2 =

(y − μ 0 )^2 s^2 /n

= n(y − μ 0 )(s^2 )−^1 (y − μ 0 )

is large. Note that the T 2 statistic has an f(1,n−1) distri- bution under the null hypothesis.

Natural Multivariate Generalization

Suppose that we wish to test H 0 : μ = μ 0 vs. Ha : μ 6 =

μ 0. Define Hotelling’s T 2 by

T 2 = n(y − μ 0 )′S−^1 (y − μ 0 ).

Hotelling’s T 2 can be viewed as a generalized distance between y and μ 0.