Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Random Vector and Multivariate Normal Distribution | ST 732, Study notes of Statistics

North Carolina State University (NCSU)Statistics

Prof. Marie Davidian

Chapter 3 Material Type: Notes; Professor: Davidian; Class: Applied Longitudianal Data Analysis; Subject: Statistics; University: North Carolina State University; Term: Unknown 1989;

Typology: Study notes

Pre 2010

Uploaded on 03/18/2009

koofers-user-afv 🇺🇸

8 documents

1 / 36

This page cannot be seen from the preview

Don't miss anything!

CHAPTER 3 ST 732, M. DAVIDIAN

3 Random vectors and multivariate normal distribution

As we saw in Chapter 1, a natural way to think about repeated measurement data is as a series of

random vectors, one vector corresponding to each unit. Because the way in which these vectors of

measurements turn out is governed by probability, we need to discuss extensions of usual univari-

ate probability distributions for (scalar) random variables to multivariate probability distributions

governing random vectors.

3.1 Preliminaries

First, it is wise to review the important concepts of random variable and probability distribution and

how we use these to model individual observations.

RANDOM VARIABLE: We may think of a random variable Yas a characteristic whose values may

vary. The way it takes on values is described by a probability distribution.

CONVENTION, REPEATED: It is customary to use upper case letters, e.g Y, to denote a generic

random variable and lower case letters, e.g. y, to denote a particular value that the random variable

may take on or that may be observed (data).

EXAMPLE: Suppose we are interested in the characteristic “body weight of rats” in the population of

all possible rats of a certain age, gender, and type. We might let

Y= body weight of a (randomly chosen) rat

from this population. Yis a random variable.

We may conceptualize that body weights of rats are distributed in this population in the sense that

some values are more common (i.e. more rats have them) than others. If we randomly select a rat

from the population, then the chance it has a certain body weight will be governed by this distribution

of weights in the population. Formally, values that Ymay take on are distributed in the population

according to an associated probability distribution that describes how likely the values are in the

population.

In a moment, we will consider more carefully why rat weights we might see vary. First, we recall the

following.

PAGE 32

Discover Study notes of Statistics North Carolina State University (NCSU)

Partial preview of the text

Download Random Vector and Multivariate Normal Distribution | ST 732 and more Study notes Statistics in PDF only on Docsity!

3 Random vectors and multivariate normal distribution

As we saw in Chapter 1, a natural way to think about repeated measurement data is as a series of random vectors, one vector corresponding to each unit. Because the way in which these vectors of measurements turn out is governed by probability, we need to discuss extensions of usual univari- ate probability distributions for (scalar) random variables to multivariate probability distributions governing random vectors.

3.1 Preliminaries

First, it is wise to review the important concepts of random variable and probability distribution and how we use these to model individual observations.

RANDOM VARIABLE: We may think of a random variable Y as a characteristic whose values may vary. The way it takes on values is described by a probability distribution.

CONVENTION, REPEATED: It is customary to use upper case letters, e.g Y , to denote a generic random variable and lower case letters, e.g. y, to denote a particular value that the random variable may take on or that may be observed (data).

EXAMPLE: Suppose we are interested in the characteristic “body weight of rats” in the population of all possible rats of a certain age, gender, and type. We might let

Y = body weight of a (randomly chosen) rat

from this population. Y is a random variable.

We may conceptualize that body weights of rats are distributed in this population in the sense that some values are more common (i.e. more rats have them) than others. If we randomly select a rat from the population, then the chance it has a certain body weight will be governed by this distribution of weights in the population. Formally, values that Y may take on are distributed in the population according to an associated probability distribution that describes how likely the values are in the population.

In a moment, we will consider more carefully why rat weights we might see vary. First, we recall the following.

(POPULATION) MEAN AND VARIANCE: Recall that the mean and variance of a probability distribution summarize notions of “center” and “spread” or “variability” of all possible values. Consider a random variable Y with an associated probability distribution.

The population mean may be thought of as the average of all possible values that Y could take on, so the average of all possible values across the entire distribution. Note that some values occur more frequently (are more likely) than others, so this average reflects this. We write

E(Y ). (3.1)

to denote this average, the population mean. The expectation operator E denotes that the “averaging” operation over all possible values of its argument is to be carried out. Formally, the average may be thought of as a “weighted” average, where each possible value is represented in accordance to the probability with which it occurs in the population. The symbol “μ” is often used.

The population mean may be thought of as a way of describing the “center” of the distribution of all possible values. The population mean is also referred to as the expected value or expectation of Y.

Recall that if we have a random sample of observations on a random variable Y , say Y 1 ,... , Yn, then the sample mean is just the average of these:

Y = n−^1 ∑^ n j=

Yj.

For example, if Y = rat weight, and we were to obtain a random sample of n = 50 rats and weigh each, then Y represents the average we would obtain.

The sample mean is a natural estimator for the population mean of the probability distribution from which the random sample was drawn.

The population variance may be thought of as measuring the spread of all possible values that may be observed, based on the squared deviations of each value from the “center” of the distribution of all possible values. More formally, variance is based on averaging squared deviations across the population, which is represented using the expectation operator, and is given by

var(Y ) = E{(Y − μ)^2 }, μ = E(Y ). (3.2)

(3.2) shows the interpretation of variance as an average of squared deviations from the mean across the population, taking into account that some values are more likely (occur with higher probability) than others.

GENERAL FACTS: If b is a fixed scalar and Y is a random variable, then

E(bY ) = bE(Y ) = bμ; i.e. all values in the average are just multiplied by b. Also, E(Y + b) = E(Y ) + b; adding a constant to each value in the population will just shift the average by this same amount.
var(bY ) = E{(bY − bμ)^2 } = b^2 var(Y ); i.e. all values in the average are just multiplied by b^2. Also, var(Y + b) = var(Y ); adding a constant to each value in the population does not affect how they vary about the mean (which is also shifted by this amount).

SOURCES OF VARIATION: We now consider why the values of a characteristic that we might observe vary. Consider again the rat weight example.

Biological variation. It is well-known that biological entities are different; although living things of the same type tend to be similar in their characteristics, they are not exactly the same (except perhaps in the case of genetically-identical clones). Thus, even if we focus on rats of the same strain, age, and gender, we expect variation in the possible weights of such rats that we might observe due to inherent, natural biological variation. Let Y represent the weight of a randomly chosen rat, with probability distribution having mean μ. If all rats were biologically identical, then the population variance of Y would be equal to 0, and we would expect all rats to have exactly weight μ. Of course, because rat weights vary as a consequence of biological factors, the variance is > 0, and thus the weight of a randomly chosen rat is not equal to μ but rather deviates from μ by some positive or negative amount. From this view, we might think of Y as being represented by

Y = μ + b, (3.3)

where b is a random variable, with population mean E(b) = 0 and variance var(b) = σ b^2 , say. Here, Y is “decomposed” into its mean value (a systematic component) and a random devia- tion b that represents by how much a rat weight might deviate from the mean rat weight due to inherent biological factors. (3.3) is a simple statistical model that emphasizes that we believe rat weights we might see vary because of biological phenomena. Note that (3.3) implies that E(Y ) = μ and var(Y ) = σ^2 b.

Measurement error. We have discussed rat weight as though, once we have a rat in hand, we may know its weight exactly. However, a scale usually must be used. Ideally, a scale should register the true weight of an item each time it is weighed, but, because such devices are imperfect, measurements on the same item may vary time after time. The amount by which the measurement differs from the truth may be thought of as an error; i.e. a deviation up or down from the true value that could be observed with a “perfect” device. A “fair” or unbiased device does not systematically register high or low most of the time; rather, the errors may go in either direction with no pattern. Thus, if we only have an unbiased scale on which to weigh rats, a rat weight we might observe reflects not only the true weight of the rat, which varies across rats, but also the error in taking the measurement. We might think of a random variable e, say, that represents the error that might contaminate a measurement of rat weight, taking on possible values in a hypothetical “population” of all such errors the scale might commit. We still believe rat weights vary due to biological variation, but what we see is also subject to measurement error. It thus makes sense to revise our thinking of what Y represents, and think of Y = “measured weight of a randomly chosen rat.” The population of all possible values Y could take on is all possible values of rat weight we might measure; i.e., all values consisting of a true weight of a rat from the population of all rats contaminated by a measurement error from the population of all possible such errors. With this thinking, it is natural to represent Y as Y = μ + b + e = μ + ≤, (3.4) where b is as in (3.3). e is the deviation due to measurement error, with E(e) = 0 and var(e) = σ e^2 , representing an unbiased but imprecise scale. In (3.4), ≤ = b + e represents the aggregate deviation due to the effects of both biological variation and measurement error. Here, E(≤) = 0 and var(≤) = σ^2 = σ^2 b + σ^2 e , so that E(Y ) = μ and var(Y ) = σ^2 according to the model (3.4). Here, σ^2 reflects the “spread” of measured rat weights and depends on both the spread in true rat weights and the spread in errors that could be committed in measuring them.

There are still further sources of variation that we could consider; we defer discussion to later in the course. For now, the important message is that, in considering statistical models, it is critical to be aware of different sources of variation that cause observations to vary. This is especially important with longitudinal data, as we will see.

The model thus says that, at each xj , there is a population of possible Yj values we might see, with mean β 0 + β 1 xj and variance σ^2. We can represent this pictorially by considering Figure 2.

Figure 2: Simple linear regression

x

y

0 2 4 6 8 10

3

4

5

6

7

8

PSfrag replacements

μ σ^21 σ^22

“ERROR”: An unfortunate convention in the literature is that the ≤j are referred to as errors, which causes some people to believe that they represent solely deviation due to measurement error. We prefer the term deviation to emphasize that Yj values may deviate from β 0 + β 1 xj due to the combined effects of several sources (but not limited to measurement error).

INDEPENDENCE: An important assumption for simple linear regression and, indeed, more general problems, is that the random variables Yj , or equivalently, the ≤j , are independent.

(Statistical) independence is a formal statistical concept with an important practical interpretation. In particular, in our simple linear regression model, this says that the way in which Yj at xj takes on its values is completely unrelated to the way in which Yj′^ observed at another position xj′^ takes on its values. This is certainly a reasonable assumption in many situations.

In our example, where xj are doses of a drug, each given to a different rat, there is no reason to believe that responses from different rats should be related in any way. Thus, the way in which Yj values turn out at different xj would be totally unrelated.

The consequence of independence is that we may think of data on an observation-by-observation basis; because the behavior of each observation is unrelated to that of others, we may talk about each one in its own right, without reference to the others.

Although this way of thinking may be relevant for regression problems where the data were collected according to a scheme like that in the example above, as we will see, it may not be relevant for longitudinal data.

3.2 Random vectors

As we have already mentioned, when several observations are taken on the same unit, it will be convenient, and in fact, necessary, to talk about them together. We thus must extend our way of thinking about random variables and probability distributions.

RANDOM VECTOR: A random vector is a vector whose elements are random variables. Let

Y =

    

Y 1

Y 2

Yn

    

be a (n × 1) random vector.

Each element of Y , Yj , j = 1,... , n, is a random variable with its own mean, variance, and probability distribution; e.g.

E(Yj ) = μj , var(yj ) = E{(Yj − μj )^2 } = σ^2 j.

We might furthermore have that Yj is normally distributed; i.e.

Yj ∼ N (μj , σ j^2 ).

Thus, if we talk about a particular element of Y in its own right, we may speak in terms of its particular probability distribution, mean, and variance.
Probability distributions for single random variables are often referred to as univariate, because they refer only to how one (scalar) random variable takes on its values.

Inspection of (3.5) shows

Covariance is defined as the average across all possible values that Yj and Yk may take on jointly of the product of the deviations of Yj and Yk from their respective means.
Thus note that if “large” values (“larger” than their means) of Yj and Yk tend to happen together (and thus “small” values of Yj and Yk tend to happen together), then the two deviations (Yj − μj ) and (Yk − μk) will tend to be positive together and negative together, so that the product

(Yj − μj )(Yk − μk) (3.6)

will tend to be positive for most of the pairs of values in the population. Thus, the average in (3.5) will likely be positive.

Conversely, if “large” values of Yj tend to happen coincidently with “small” values of Yk and vice versa, then the deviation (Yj − μj ) will tend to be positive when (Yk − μk) tends to be negative, and vice versa. Thus the product (3.6) will tend to be negative for most of the pairs of values in the population. Thus, the average in (3.5) will likely be negative.
Moreover, if in truth Yj and Yk are unrelated, so that “large” Yj are likely to happen with “small” Yk and “large” Yk and vice versa, then we would expect the deviations (Yj − μj ) and (Yk − μk) to be positive and negative in no real systematic way. Thus, (3.6) may be negative or positive with no special tendency, and the average in (3.5) would likely be zero.

Thus, the quantity of covariance defined in (3.5) makes intuitive sense as a measure of how “associated” values of Yj are with values of Yk.

In the last bullet above, Yj and Yk are unrelated, and we argued that cov(Yj , Yk) = 0. In fact, formally, if Yj and Yk are statistically independent, then it follows that cov(Yj , Yk) = 0.
Note that cov(Yj , Yk) = cov(Yk, Yj ).
Fact: the covariance of a random variable Yj and itself,

cov(Yj , Yj ) = E{(Yj − μj )(Yj − μj )} = var(Yj ) = σ^2 j.

Fact: If we have two random variables, Yj and Yk, then

var(Yj + Yk) = var(Yj ) + var(Yk) + 2cov(Yj , Yk).

That is, the variance of the population consisting of all possible values of the sum Yj + Yk is the sum of the variances for each population, adjusted by how “associated” the two values are. Note that if Yj and Yk are independent, var(Yj + Yk) = var(Yj ) + var(Yk).

We now see how all of this information is summarized.

EXPECTATION OF A RANDOM VECTOR: For an entire n-dimensional vector random Y , we sum- marize the means for each element in a vector

μ =

    

E(Y 1 )

E(Y 2 )

E(Yn)

    

    

μ 1 μ 2 ... μn

    

We define the expected value or mean of Y as

E(Y ) = μ;

the expectation operation is applied to each element in the vector Y , yielding the vector μ of means.

RANDOM MATRIX: A random matrix is simply a matrix whose elements are random variables; we will see a specific example of importance to us in a moment. Formally, if Y is a (r × c) matrix with element Yjk, each a random variable, then each element has an expectation, E(Yjk) = μjk, say. Then the expected value or mean of Y is defined as the corresponding matrix of means; i.e.

E(Y) =

   

E(Y 11 ) E(Y 12 ) · · · E(Y 1 c) ... ... ... ... E(Yr 1 ) E(Yr 2 ) · · · E(Yrc)

   .

COVARIANCE MATRIX: We now see how this concept is used to summarize information on covariance among the elements of a random vector. Note that

(Y − μ)(Y − μ)′^ =

    

(Y 1 − μ 1 )^2 (Y 1 − μ 1 )(Y 2 − μ 2 ) · · · (Y 1 − μ 1 )(Yn − μn) (Y 2 − μ 2 )(Y 1 − μ 1 ) (Y 2 − μ 2 )^2 · · · (Y 2 − μ 2 )(Yn − μn) ... ...... ... (Yn − μn)(Y 1 − μ 1 ) (Yn − μn)(Y 2 − μ 2 ) · · · (Yn − μn)^2

    

which is a random matrix.

CORRELATION: It is informative to separate the information on “spread” contained in variances σ j^2 from that describing “association.” Thus, we define a particular measure of association that takes into account the fact that different elements of Y may vary differently on their own.

The population correlation coefficient between Yj and Yk is defined as

ρjk = √σjk σ^2 j

√ σ^2 k

Of course, σj =

√ σ j^2 is the population standard deviation of Yj , on the same scale of measurement as Yj , and similarly for Yk.

ρjk scales the information on association in the covariance in accordance with the magnitude of variation in each random variable, creating a “unitless” measure. Thus, it allows one to think of the associations among variables measured on different scales.
ρjk = ρkj.
Note that if σjk = σj σk, then ρjk = 1. Intuitively, if this is true, it says that the ways Yj and Yk vary separately is identical to how they vary together, so that if we know one, we know the other. Thus, a correlation of 1 indicates that the two random variables are “perfectly positively associated.” Similarly, if σjk = −σj σk, then ρjk = −1 and by the same reasoning they are “perfectly negatively associated.”
Clearly, ρjj = 1, so a random variable is perfectly positively correlated with itself.
It may be shown that correlations must satisfy − 1 ≤ ρjk ≤ 1.
If σjk = 0 then ρjk = 0, so if Yj and Yk are independent, then they have 0 correlation.

CORRELATION MATRIX: It is customary to summarize the information on correlations in a matrix: The correlation matrix Γ is defined as

    

1 ρ 12 · · · ρ 1 n ρ 21 1 · · · ρ 1 n ... ...... ... ρn 1 ρn 2 · · · 1

    

For now, we use the symbol Γ to denote the correlation matrix of a random vector.

ALTERNATIVE REPRESENTATION OF COVARIANCE MATRIX: Note that knowledge of the vari- ances σ^21 ,... , σ n^2 and the correlation matrix Γ is equivalent to knowledge of Σ, and vice versa. It is often easier to think of associations among random variables on the unitless correlation scale than in terms of covariance; thus, it is often convenient to write the covariance matrix another way that presents the correlations explicitly.

Define the “standard deviation” matrix

T 1 /^2 =

    

σ 1 0 · · · 0 0 σ 2 · · · 0 ... ...... ... 0 0 · · · σn

    

The “1/2” reminds us that this is a diagonal matrix with the square roots of the variances on the diagonal. Then it may be verified that (try it)

T 1 /^2 ΓT 1 /^2 = Σ. (3.7)

The representation (3.7) will prove convenient when we wish to discuss associations implied by models for longitudinal data in terms of correlations. Moreover, it is useful to appreciate (3.7), as it allows calculations involving Σ that we will see later to be implemented easily on a computer.

GENERAL FACTS: As we will see later, we will often be interested in linear combinations of the elements of a random vector Y ; that is, functions of the form

c 1 Y 1 + · · · cnYn,

which may be written succinctly as c′Y , where c is the column vector

c =

   

c 1 ... cn

   

Note that c′Y is a scalar quantity.

It is possible using facts on the multiplication random variables by scalars (see above) and the definitions of μ and Σ to show that E(c′Y ) = c′μ var(c′Y ) = c′Σc.

(Try to verify these.)

If we have a random vector Y with elements that are continuous random variables, then, it is natural to consider the normal distribution as a probability model for each element Yj. However, as we have discussed, we are likely to be concerned about associations among the elements of Y. Thus, it does not suffice to describe each of the elements Yj separately; rather, we seek a probability model that describes their joint behavior. As we have noted, such probability distributions are called multivariate for obvious reasons.

The multivariate normal distribution is the extension of the normal distribution of a single random variable to a random vector composed of elements that are each normally distributed. Through its form, it naturally takes into account correlation among the elements of Y ; moreover, it gives a basis for a way of thinking about an extension of “least squares” that is relevant when observations are not independent but rather are correlated.

NORMAL PROBABILITY DENSITY: Recall that, for a random variable y, the normal distribution has probability density function

f (y) = (^) (2π)^11 / (^2) σ exp

{ −(y − μ)^2 /(2σ^2 )

}

. (3.8)

This function has the shape shown in Figure 3. The shape will vary in terms of “center” and “spread” according to the values of the population mean μ and variance σ^2 (e.g. recall Figure 1).

Figure 3: Normal density function with mean μ.

PSfrag replacements

μ

σ^21 σ^22

Several features are evident from the form of (3.8):

The form of the function is determined by μ and σ^2. Thus, if we know the population mean and variance of a random variable Y , and we know it is normally distributed, we know everything about the probabilities associated with values of Y , because we then know the function (3.8) completely.
The form of (3.8) depends critically on the term

− (y^ −^ μ)

2 σ^2 = (y^ −^ μ)(σ

(^2) )− (^1) (y − μ). (3.9)

Note that this term depends on the squared deviation (y − μ)^2.

The deviation is standardized by the standard deviation σ, which has the same units as y, so that it is put on a unitless basis.
This standardized deviation has the interpretation of a distance measure – it measures how far y is from μ, and then puts the result on a unitless basis relative to the “spread” about μ expected.
Thus, the normal distribution and methods such as least squares, which depends on minimizing a sum of squared deviations, have an intimate connection. We will use this connection to motivate the interpretation of the form of multivariate normal distribution informally now. Later in the course, we will be more formal about this connection.

SIMPLE LINEAR REGRESSION: For now, to appreciate this form and its extension, consider the method of least squares for fitting a simple linear regression. (The same considerations apply to multiple linear regression, which will be discussed later in this chapter.) As before, at each fixed value x 1 ,... , xn, there is a corresponding random variable Yj , j = 1,... , n, which is assumed to arise from

Yj = β 0 + β 1 xj + ≤j , β = (β 0 , β 1 )′

The further assumption is that Yj are each normally distributed with means μj = β 0 +β 1 xj and variance σ^2.

Thus, each Yj ∼ N (μj , σ^2 ), so that they have different means but the same variance.
Furthermore, the Yj are assumed to be independent.

MULTIVARIATE NORMAL PROBABILITY DENSITY: The joint probability distribution that is the extension of (3.8) to a (n × 1) random vector Y , each of whose components are normally distributed (but possibly associated), is given by

f (y) = (^) (2π^1 )n/ 2 |Σ|−^1 /^2 exp

{ −(y − μ)′Σ−^1 (y − μ)/ 2

} (3.11)

(3.11) describes the probabilities with which the random variable Y takes on values jointly in its n elements.
The form of (3.11) is determined by μ and Σ. Thus, as in the univariate case, if we know the mean vector and covariance matrix of a random vector Y , and we know each of its elements are normally distributed, then we know everything about the joint probabilities associated with values y of Y.
By analogy to (3.9), the form of f (y) depends critically on the term

(y − μ)′Σ−^1 (y − μ). (3.12)

Note that this is a quadratic form, so it is a scalar function of the elements of (y − μ) and Σ−^1. Specifically, if we refer to the elements of Σ−^1 as σjk, i.e.

Σ−^1 =

   

σ^11 · · · σ^1 n ...... ... σn^1 · · · σnn

   

then we may write

(y − μ)′Σ−^1 (y − μ) = ∑^ n j=

∑^ n k=

σjk(yj − μj )(yk − μk). (3.13)

Of course, the elements σjk^ will be complicated functions of the elements σ j^2 , σjk of Σ, i.e. the variances of the Yj and the covariances among them.

This term thus depends on not only the squared deviations (yj − μj )^2 for each element in y (which arise in the double sum when j = k), but also on the crossproducts (yj − μj )(yk − μk). Each contribution of these squares and crossproducts is being “standardized” somehow by values σjk^ that somehow involve the variances and covariances.
Thus, although it is quite complicated, one gets the suspicion that (3.13) has an interpretation, albeit more complex, as a distance measure, just as in the univariate case.

BIVARIATE NORMAL DISTRIBUTION: To gain insight into this suspicion, and to get a better understanding of the multivariate distribution, it is instructive to consider the special case n = 2, the simplest example of a multivariate normal distribution (hence the name bivariate).

Here,

Y =

  Y^1 Y 2

  , μ =

  μ^1 μ 2

  , Σ =

  σ^21 σ^12 σ 12 σ 22

 .

Using the inversion formula for a (2 × 2) matrix given in Chapter 2,

Σ−^1 = (^) σ 2 1 1 σ^22 −^ σ 122

  σ 22 −σ^12 −σ 12 σ^21

 .

We also have that the correlation between Y 1 and Y 2 is given by

ρ 12 = (^) σσ 112 σ 2.

Using these results, it is an algebraic exercise to show that (try it!)

(y − μ)′Σ−^1 (y − μ) = (^1) −^1 ρ 2 12

{ (^) (y 1 −^ μ 1 )^2 σ^21 +

(y 2 − μ 2 )^2 σ^22 −^2 ρ^12

(y 1 − μ 1 ) σ 1

(y 2 − μ 2 ) σ 2

}

. (3.14)

Compare this expression to the general one (3.13).

Inspection of (3.14) shows that the quadratic form involves two components:

The sum of standardized squared deviations (y 1 − μ 1 )^2 σ^21 +

(y 2 − μ 2 )^2 σ 22. This sum alone is in the spirit of the sum of squared deviations in least squares, with the difference that each deviation is now weighted in accordance with its variance. This makes sense – because the variances of Y 1 and Y 2 differ, information on the population of Y 1 values is of “different quality” than that on the population of Y 2 values. If variance is “large,” the quality of information is poorer; thus, the larger the variance, the smaller the “weight,” so that information of “higher quality” receives more weight in the overall measure. Indeed, then, this is like a “distance measure,” where each contribution receives an appropriate weight.

Random Vector and Multivariate Normal Distribution | ST 732, Study notes of Statistics

Related documents

Partial preview of the text

Download Random Vector and Multivariate Normal Distribution | ST 732 and more Study notes Statistics in PDF only on Docsity!

3 Random vectors and multivariate normal distribution

3.1 Preliminaries

3.2 Random vectors

Y =

Y 1

Y 2

E(Y 1 )

E(Y 2 )

E(Y) =

T 1 /^2 =

Σ−^1 =