Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Multivariate Analysis - Lecture Notes | STAT 8320, Exams of Statistics

University of Missouri (MU) - Columbia Statistics

Material Type: Exam; Class: Data Analysis II; Subject: Statistics; University: University of Missouri - Columbia; Term: Unknown 1989;

Typology: Exams

Pre 2010

Uploaded on 08/19/2009

koofers-user-vpn 🇺🇸

9 documents

1 / 33

This page cannot be seen from the preview

Don't miss anything!

Multivariate Analysis

Introduction:

Consider an experiment in swine breeding where we will

measure several traits:

•Size of litter.

•Weight of largest piglet.

•Weight of smallest piglet.

•Number of days to weaning.

•Mother’s weight at piglet’s birth.

•Mother’s weight at piglet’s weaning.

Suppose that we have three genetically different types

of pigs and we will measure the above variables on each

of n= 30 mother pigs from each breed.

Question: How can the joint variability of the p= 6

traits be described?

MANOVA:

•Are the sets of means for the p= 6 traits the same

for all varieties?

•Which varieties are different or better?

•Do varieties differ for all traits or only for some

traits?

A.1

Discover Exams of Statistics University of Missouri (MU) - Columbia

Partial preview of the text

Download Multivariate Analysis - Lecture Notes | STAT 8320 and more Exams Statistics in PDF only on Docsity!

Multivariate Analysis

Introduction:

Consider an experiment in swine breeding where we will measure several traits:

Size of litter.
Weight of largest piglet.
Weight of smallest piglet.
Number of days to weaning.
Mother’s weight at piglet’s birth.
Mother’s weight at piglet’s weaning.

Suppose that we have three genetically different types of pigs and we will measure the above variables on each of n = 30 mother pigs from each breed.

Question: How can the joint variability of the p = 6 traits be described?

MANOVA:

Are the sets of means for the p = 6 traits the same for all varieties?
Which varieties are different or better?
Do varieties differ for all traits or only for some traits?

PRINCIPAL COMPONENTS:

Which linear combination of the measured traits has the largest variance?
Which linear combination has the smallest variance?
Can most of the variation in the p = 6 traits be ac- counted for by the variation in 2 linear combinations of the traits? (Data reduction and description)

FACTOR ANALYSIS:

Can correlation among traits be attributed to vari- ation in one or more unobservable factors?

DISCRIMINANT ANALYSIS:

Can you classify an observation as belonging to one of the several possible populations?

CLUSTER ANALYSIS:

If you did not know the varieties, could you group the data into homogeneous classes?

Properties of Covariance Matrices

Symmetric, Σ′^ = Σ.
Non-negative definite, a′Σa ≥ 0 for any a ∈ <p. This is equivalent to the statement that the eigenvalues of Σ, λ 1 ≥ λ 2 ≥ · · · ≥ λp ≥ 0.
|Σ| = λ 1 λ 2 · · · λp ≥ 0 (generalized variance).
trace(Σ) = tr(Σ) = λ 1 + λ 2 + · · · + λp = σ 11 + σ 22 + · · · + σpp = sum of variances (total variance).

Note: We will usually require that Σ be positive definite. This implies that all eigenvalues are positive, and Σ has an inverse Σ−^1 , such that Σ−^1 Σ = Ip×p = ΣΣ−^1.

Correlations:

Define the correlation ρij and the correlation matrix by

ρij =

σij √σ iiσjj

and R =

ρ 11 ρ 12 · · · ρ 1 p ρ 21 ρ 22 · · · ρ 2 p ... ...... ... ρp 1 ρp 2 · · · ρpp

Notice that we can also write the correlation matrix as R = [diag(Σ)]−^1 /^2 Σ[diag(Σ)]−^1 /^2 , where diag(Σ) is just the matrix which has the σii’s on the diagonal and 0’s elsewhere. Additionally, the square root of a symmetric matrix A, denoted A^1 /^2 , is a symmetric matrix such that A = A^1 /^2 A^1 /^2.

Some useful equalities:

Let x and y be random vectors with means μx and μy and variance-covariance matrices Σx and Σy. Let A and B be matrices of constants and c and d be vectors of constants. Then

E(Ay + c) = Aμy + c.
var(Ay + c) = Avar(y)A′^ = AΣyA′.
cov(Ay + c, By + d) = AΣyB′.
E(Ay + Bx + c) = Aμy + Bμx + c.
var(Ay + Bx + c) = AΣyA′^ + BΣxB′^ + AΣyxB′^ + BΣ′ yxA′.

Let’s show the third of these below:

Individual components y 1 , · · · , yp are all normally dis- tributed yi ∼ N 1 (μi, σii). This follows immediately from the third bullet point. (Why?)
The conditional distribution of y 1 given y 2 is also normal.

y 1 |y 2 ∼ Np 1 (μ 1 + Σ 12 Σ− 221 (y 2 − μ 2 ), Σ 11 − Σ 12 Σ− 221 Σ 21 ).

Note: there are analogous results for y 2 |y 1. Ad- ditionally, y 1 and y 2 are independently distributed only if Σ 12 = 0.

If y ∼ N(μ, Σ) and Σ is positive definite, then

(y − μ)′Σ−^1 (y − μ) ∼ χ^2 (p).

We will call this quantity the squared Mahalanobis distance between y and μ.

If yi are independent N(μi, Σi) rv’s, then for fixed matrices Ai(m×p),

∑^ k

Aiyi ∼ Nm(

∑^ k

Aiμi,

∑^ k

AiΣiA′ i)

Suppose that ( y 1 y 2

∼ N 2

If y 2 = 3, what is the conditional distribution of y 1?

Sampling from Multivariate Normal Populations

Suppose we have a random sample of size n, y 1 , · · · , yn

from Np(μ, Σ). Then,

Since ∑ y 1 , · · · , yn are iid, their sample mean, ¯y = n i=1 yi/n^ ∼^ Np(μ,^ Σ/n).^ That is, ¯y^ is an unbiased estimator of μ.
The p × p sample variance-covariance matrix, S, is

S =

n − 1

∑^ n

(yi − ¯y)(yi − ¯y)′^ =

n − 1

{ (^) n ∑

yiy′ i − n¯y¯y′

Note that S is symmetric and contains p(p + 1)/ 2 different random variables. Further, S is an unbi- ased estimator of Σ.

(n − 1)S ∼ Wp(n − 1 , Σ), a Wishart distribution with n − 1 degrees of freedom and expectation (n − 1)Σ. The Wishart distribution is just a multivariate ex- tension of the Chi-squared distribution.
y¯ and S are independent.
y¯ and S are sufficient statistics. (All of the infor- mation in the data about μ and Σ is contained in ¯y and S, regardless of sample size).

Maximum Likelihood Estimation for MVN

Suppose that we have an independent random sample y 1 , · · · , yn ∼ Np(μ, Σ). The we can write the likelihood function for the data as

L(μ, Σ) =

∏^ n

(2π)p/^2 |Σ|^1 /^2

e−

(^12) (yj −μ)′Σ− (^1) (yj −μ)

(2π)np/^2 |Σ|n/^2

e−

12 ∑n j=1(yj^ −μ)

′Σ− (^1) (yj −μ) .

Then, we can write the m.l.e’s as

ˆμ = ¯y and ˆΣ =

n − 1 n

S.

We can show this by taking the log of the likelihood function and taking derivatives with respect to μ and Σ.

Properties of MLEs

INVARIANCE: If ˆθ is the mle of θ, then the mle of h(θ) is h(ˆθ), for any function h(·).
CONSISTENCY: mle’s are consistent estimators, but they are usually biased.
EFFICIENCY: mle’s are efficient estimators (no other estimator has a smaller variance for large samples).

Properties of MLEs, cont.

ASYMPTOTIC NORMALITY: Suppose that ˆθn is the mle for θ based upon n independent observa- tions. Then ˆθn ∼˙N(θ, H−^1 ). - H is called the Fisher Information Matrix. It contains the expected values of the second par- tial derivatives of the log-likelihood function. The (i, j)th element of H is

−E

∂^2 `(θ) ∂θi∂θj

We can estimate H be finding the form deter- mined above, and evaluating it at θ = ˆθn.
LIKELIHOOD RATIO TESTING: For some null hypothesis, H 0 , we can form a likelihood ratio test.
The statistic used is

maxH 0 (μ, Σ|Y ) max(μ, Σ|Y )

For large n, −2 log Λ ∼ χ^2 (ν) where ν is the num- ber of parameters in the unrestricted space - the number of parameters under H 0.
We reject H 0 if Λ is too small. Why?

Matrix Differentiation, Cont.

If X is symmetric and A is a matrix which does not depend upon X, then

∂trXA ∂X

= A + A′^ − Diag(A).

If X is symmetric and we let Jij be a matrix which has a 1 in the (i, j)th position and 0s elsewhere, then

∂X−^1

∂xij

−X−^1 JiiX−^1 , i = j −X−^1 (Jij + Jji)X−^1 , i 6 = j

Use these rules to find the derivative ∂x′Aa/∂x.

Now, find the derivative ∂trX/∂X.

Example:

Suppose that we collect the following sets of two num- bers based upon some process: (18. 4 , 25 .4), (20. 8 , 22 .1), (21. 8 , 27 .8), (19. 3 , 23 .7), (18. 9 , 27 .2), (19. 0 , 26 .6), (21. 7 , 29 .2), (22. 7 , 24 .2), (20. 6 , 24 .8), (18. 0 , 21 .9), (18. 4 , 26 .7), and (22. 6 , 21 .8). Find ¯y.

Next, find S. For simplicity, we will just find the co- variance here in class. (So that you can check them, s 11 = 2.96 and s 22 = 6.04.)

What is the mle of Σ? of μ?

Example, cont.

Finally, suppose that we wish to test the null hypothesis that μ 1 = μ 2 = 22.5. NOTE: Under restrictions, the mle of the mean and variance-covariance matrix may not have the same forms that we have discussed previously. First, what will be the value of ν?

Next, suppose that the mles under the restriction are σˆ 11 = 8.82, σˆ 12 = − 6 .73, and σˆ 22 = 13.51. What is the form of the numerator for the likelihood ratio test?

What is the form of the denominator for this test?

Note: it would not be appropriate to assume that 12 observations constitutes a large sample. This is for demonstration purposes only!

Tests of Multivariate Normality

We will discuss three different options for assessing mul- tivariate normality. When analyzing data we may wish to look at one or more of these possible methods.

Check univariate normality for each trait seperately.
- This can be done using methods from Data Analysis I; for example, Shapiro-Wilks test.
- Notice that marginal normality of all traits does not imply joint MVN.
Mardia’s tests for Multivariate normality.
- Multivariate skewness can be defined as

β 1 ,p = E

(y − μ)′Σ−^1 (x − μ)

where x and y are independent, but have the same distribution.

Multivariate kurtosis is defined as

β 2 ,p = E

(y − μ)′Σ−^1 (y − μ)

For MVN distribution, we have that β 1 ,p = 0 and β 2 ,p = p(p + 2).

Tests of MVN, Cont.

Chi-Square Q-Q Plot
- Let yi, i = 1, · · · , n be a random sample from Np(μ, Σ).
- Then zi = Σ−^1 /^2 (yi − μ) i = 1 , · · · , n are iid Np(0, I). Thus, δ i^2 = z′ izi i = 1, · · · , n follows a χ^2 p distribution.
- We can plot the ordered d^2 i values against the quantiles of the χ^2 p distribution. When normality holds, the plot should approximately resemble a straight line passing through the origin at a 45◦ angle.

0 2 4 6 8 10 12

Chi−Square Quantiles

Mahalanobis Distance

If the data are not normal, we can (1) ignore this, (2) use nonparametric methods, (3) use models based upon an appropriate distribution, or (4) try performing a transformation.

Inference about a Mean Vector:

Recall that for a univariate normal distribution, we could test the hypothesis H 0 : μ = μ 0 by using the test statistic

T =

y − μ 0 s/

which has a tn− 1 distribution under the null hypothesis. We would reject this null hypothesis if |T | is large relative to t(1−α/ 2 ,n−1) because this indicates that seeing a value as large as ours is rare if the null is true.

This is equivalent to rejecting the null hypothesis if

T 2 =

(y − μ 0 )^2 s^2 /n

= n(y − μ 0 )(s^2 )−^1 (y − μ 0 )

is large. Note that the T 2 statistic has an f(1,n−1) distri- bution under the null hypothesis.

Natural Multivariate Generalization

Suppose that we wish to test H 0 : μ = μ 0 vs. Ha : μ 6 =

μ 0. Define Hotelling’s T 2 by

T 2 = n(y − μ 0 )′S−^1 (y − μ 0 ).

Hotelling’s T 2 can be viewed as a generalized distance between y and μ 0.

Multivariate Analysis - Lecture Notes | STAT 8320, Exams of Statistics

Related documents

Partial preview of the text

Download Multivariate Analysis - Lecture Notes | STAT 8320 and more Exams Statistics in PDF only on Docsity!

Multivariate Analysis

PRINCIPAL COMPONENTS:

FACTOR ANALYSIS:

DISCRIMINANT ANALYSIS:

CLUSTER ANALYSIS:

Properties of Covariance Matrices

∼ N 2

Sampling from Multivariate Normal Populations

S =

Maximum Likelihood Estimation for MVN

S.

−E

Matrix Differentiation, Cont.

∂X−^1

Example:

Example, cont.

Tests of Multivariate Normality

Tests of MVN, Cont.

Inference about a Mean Vector:

T =

T 2 =