Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Random Coefficient Model for Distance Based on Gender and Age - Prof. Marie Davidian, Study notes of Statistics

North Carolina State University (NCSU)Statistics

Prof. Marie Davidian

Sas code for conducting a random coefficient model analysis on a dataset named dent1 to examine the relationship between distance and gender, as well as the effect of age on this relationship for each gender. The model includes both random intercepts and slopes for the subject variable 'child'. The output includes fixed effects, random effects, and reconfigured datasets for further analysis.

Typology: Study notes

Pre 2010

Uploaded on 03/18/2009

koofers-user-n2t 🇺🇸

8 documents

1 / 60

This page cannot be seen from the preview

Don't miss anything!

CHAPTER 10 ST 732, M. DAVIDIAN

10 Linear mixed effects models for multivariate normal data

10.1 Introduction

Random coefficient models, where we develop an overall statistical model by thinking first about indi-

vidual trajectories in a “subject-specific” fashion, are a special case of a more general model framework

based on the same perspective. This model framework, known popularly as the linear mixed effects

model, is still based on thinking about individual behavior first, of course. However, the possibilities

for how this is represented, and how the variation in the population is represented, are broadened. The

result is a very flexible and rich set of models for characterizing repeated measurement data.

The broader possibilities that are encompassed are best illustrated by examples. In the next section,

we consider several examples that highlight some of these possibilities. We then note that all of the

examples, as well as the random coefficient model as described in the last chapter, may be written in a

unified way. Moreover, the same inferential techniques of maximum likelihood and restricted maximum

likelihood are also applicable.

As mentioned in our discussion of random coefficient models, one advantage is that the model naturally

represents individual trajectories in a formal way, so that questions of interest about individual

behavior may be considered. In this chapter, we will show in the context of the general linear mixed

effects model framework how “estimation” of individual trajectories may carried out.

10.2 Examples

RANDOM COEFFICIENT MODEL: To set the stage, recall the random coefficient model where each

unit is assumed to have its own inherent straight line trajectory, with its own intercept and slope β0i

and β1i, i.e.

Yij =β0i+β1itij +eij,βi=



β0i

β1i





.

If furthermore units are from, say, q= 2 groups, then the population model would be

βi=Aiβ+bi,bi∼ N (0,D),

PAGE 363

Discover Study notes of Statistics North Carolina State University (NCSU)

Partial preview of the text

Download Random Coefficient Model for Distance Based on Gender and Age - Prof. Marie Davidian and more Study notes Statistics in PDF only on Docsity!

10 Linear mixed effects models for multivariate normal data

10.1 Introduction

Random coefficient models, where we develop an overall statistical model by thinking first about indi- vidual trajectories in a “subject-specific” fashion, are a special case of a more general model framework based on the same perspective. This model framework, known popularly as the linear mixed effects model, is still based on thinking about individual behavior first, of course. However, the possibilities for how this is represented, and how the variation in the population is represented, are broadened. The result is a very flexible and rich set of models for characterizing repeated measurement data.

The broader possibilities that are encompassed are best illustrated by examples. In the next section, we consider several examples that highlight some of these possibilities. We then note that all of the examples, as well as the random coefficient model as described in the last chapter, may be written in a unified way. Moreover, the same inferential techniques of maximum likelihood and restricted maximum likelihood are also applicable.

As mentioned in our discussion of random coefficient models, one advantage is that the model naturally represents individual trajectories in a formal way, so that questions of interest about individual behavior may be considered. In this chapter, we will show in the context of the general linear mixed effects model framework how “estimation” of individual trajectories may carried out.

10.2 Examples

RANDOM COEFFICIENT MODEL: To set the stage, recall the random coefficient model where each unit is assumed to have its own inherent straight line trajectory, with its own intercept and slope β 0 i and β 1 i, i.e.

Yij = β 0 i + β 1 itij + eij , βi =

  β^0 i β 1 i

 .

If furthermore units are from, say, q = 2 groups, then the population model would be

βi = Aiβ + bi, bi ∼ N ( 0 , D),

β =

    

β 01 β 11 β 02 β 12

    

, bi =

  b^0 i b 1 i

 

and Ai is the appropriate matrix of 0’s and 1’s that “picks off” the intercept and slope for the group to which i belongs. If there is only q = 1 group, then Ai = I 2 for all i and β = (β 0 , β 1 )′.

Implicit in the statement of this model is that both intercepts and slopes exhibit nonnegligible variation among units in the population(s) of interest. This belief is represented by the (2 × 1) random effect bi – the intercept and slope for different units vary about the mean intercept and slope according to bi.

MAGNITUDES OF AMONG-UNIT VARIATION: For simplicity, consider first a situation with a sin- gle group, so that all β 0 i and β 1 i in the random coefficient model are assumed to vary about a common mean intercept and slope. Consider Figure 1, which depicts longitudinal data for 10 hypothetical units.

Figure 1: Longitudinal data where variation in slope may be negligible

days

response

0 5 10 15 20 25 30

20

40

60

80

100

120

140

- •
```
- (^) • 
```
- (^) •
```
- - 
```
- •
```
- (^) • 
```
- - - - •
- •
```
- - 
```
- - (^) •
    - •
- •^
```
- (^) • 
```
- •
  - (^) •

PSfrag replacements

μ σ^21 σ^22 ρ 12 = 0. 0 ρ 12 = 0. 8

y 1 y 2

If we believed that the second possibility were likely, we might still want to consider model (10.1). If we considered the usual random coefficient model with

β 0 i = β 0 + b 0 i β 1 i = β 1 + b 1 i,

then for the matrix D, the D 11 , represents the variance of b 0 i (among intercepts) and D 22 that of b 1 i (among slopes). If D 11 is nonnegligible relative to the mean intercept, then this suggests that intercepts vary perceptibly. If on the other hand D 22 is virtually negligible relative to the size of the mean slope, then this suggests that variation in slopes is almost undetectable.

It is a fact of life that, when this is the case, the numerical algorithms used to implement fitting of the model (e.g. by ML or REML) may experience serious difficulties. The algorithm simply cannot pin down D 22 , and this makes it also have a hard time pinning down the covariance D 12.
Thus, in situations where this is true, it may be a reasonable approximation to the truth to say that, for all practical purposes, the variation among β 1 i slopes is negligible. Although we don’t necessarily believe that the slopes don’t vary at all, saying their variance is negligible is an approximation that is probably reasonably close enough to the truth to accept for practical purposes. This assumption will allow implementation of the model to be feasible.

In either case, we are faced with a situation that does not quite fit into the random coefficient framework. The individual-specific parameters βi no longer have all elements varying! How may we represent this? This is most easily seen by “brute force.” We have

Yij = β 0 i + β 1 itij + eij ,

β 0 i = β 0 + b 0 i, β 1 i = β 1. (10.2)

Plugging the representations for β 0 i and β 1 i into the first stage model, we obtain

Yij = β 0 + β 1 tij + b 0 i + eij. (10.3)

If we think of the implication of (10.3) for the entire vector Y (^) i, it is straightforward to see that we may write this succinctly as Y (^) i = Xiβ + 1 b 0 i + ei,

where as usual 1 is a (ni × 1) vector of 1’s and Xi is the design matrix for individual i

Xi =

   

1 ti 1 ... ... 1 tini

   

Note that if we let Zi = 1 and bi = b 0 i (1 × 1), we may write this in the form

Y (^) i = Xiβ + Zibi + ei (10.4)

as before – this looks identical to the general representation we used in the last chapter, except that the definitions of Xi and Zi we used in the single group case are now different. Other than this, the model has exactly the same form, once we’ve defined Xi and Zi appropriately.

Alternatively, we can do the same calculation with more fancy footwork. We will illustrate this in a way that allows immediate extension to the case of more than one group; to this end, it is convenient to use a different symbol to represent the design matrix for individual i (we called it X (^) i above). Thus, write

Ci =

   

1 ti 1 ... ... 1 tini

   .

Furthermore, note that we may write (10.2) as follows (verify)

βi = Aiβ + Bibi, bi = b 0 i (1 × 1), (10.5)

where Ai is an identity matrix and

Bi =

  1 0

  , (2 × 1).

With these representations, if we think of the model that says each child has his/her own straight line regression model with child-specific regression parameter βi, i.e.

Y (^) i = Ciβi + ei,

plugging (10.5) into this expression gives

Y (^) i = CiAiβ + CiBibi + ei. (10.6)

To gain a further understanding of this, consider another possibility.

OTHER COVARIATES: In some instances, the question of interest may in fact involve the possible association between the values of measured covariates and rate of change of a response over time. We now see that it is possible to write models appropriate for this situation in the form (10.4) for suitable choices of Xi and Zi.

An example arises in understanding the progression of disease in HIV-infected patients assigned to follow a certain therapeutic regimen. HIV attacks the immune system, so HIV-infected subjects often have compromised immune system characteristics. A standard measure of immune status is CD4 count, where lower counts indicate poorer status. Now a standard measure of how well a patient is doing is viral load, roughly the “amount” of virus present in the body, and it is routine to follow viral load over time to monitor a patient’s well-being. HIV scientists may be interested in whether the nature of viral load progression is different depending on a subject’s immune system at the time of initiation of therapy. To develop a formal model to address this issue, suppose initially there is only one group.

Let Yij be the viral load measurement taken on subject i at time tij (usually measured in units of “log copy number”) following start of therapy at time 0, and suppose that for any given subject, the trajectory of viral load measurements over time appears to be a straight line, with subject-specific intercept and slope; i.e. Yij = β 0 i + β 1 itij + eij , βi = (β 0 i, β 1 i)′
In addition, suppose that at time 0 (“baseline”) for all subjects, a CD4 count measurement is available; denote this measurement as ai for the ith subject.
In terms of the individual model, then, the question of interest is whether the magnitude and direction of individual rates of change, i.e. slopes β 1 i, are associated with the value of ai. We may state such an association formally as

β 1 i = β 2 + β 3 ai + b 1 i.

For illustration, suppose that we do not believe that the intercepts, which represent viral load at time 0, are associated with CD4 count (this is actually unlikely, but we assume it here for purposes of developing a simple model). We may state this as

β 0 i = β 1 + b 0 i.

We may write this succinctly as

βi = Aiβ + bi, β =

   

β 1 β 2 β 3

    ,^ bi^ =

  b^0 i b 1 i

  , Ai =

  1 0 0 0 1 ai

 

Note that this model allows the possibility that both intercepts and slopes vary in the population of subjects. However, it states that the fact that slopes vary across individuals may in part be associated with their baseline CD4 counts.
The question of interest in the context of this model is about the value of β 3 ; if β 3 = 0, then this says that there is no association between baseline CD4 and subsequent rate of change of viral load while on this therapy.
The model for βi itself has the flavor of a “regression model.” Here, ai is a covariate in this model.

It is straightforward to see that this model may be put into the form of (10.4). Plugging in the form of βi into the individual model, we see that

Yij = β 1 + β 2 tij + β 3 aitij + b 0 i + b 1 itij + eij , j = 1,... , ni.

It may be verified that this may be written succinctly as

Y (^) i = Xiβ + Zibi + ei,

where

Xi =

   

1 ti 1 aiti 1 ... ... ... 1 tini aitini

    , Zi =

   

1 ti 1 ... ... 1 tini

    = Ci, say.

Some parameters not to vary in the population, as above. As a hypothetical example, suppose we wanted a model that expresses the belief that variation among slopes is entirely attributable to CD4 count and that none of the variation in slopes is random, while variation in intercepts is random. (This sounds biologically questionable, but we consider it for illustration.) With 2 groups, this could be expressed as

β 0 i = β 1 + b 0 i for treatment 1, = β 4 + b 0 i for treatment 2, β 1 i = β 2 + β 3 ai for treatment 1, = β 5 + β 6 ai for treatment 2,

We could again write this as βi = Aiβ + Bibi with Ai and β as above but with bi = b 0 i and Bi = (1, 0)′.

By plugging these representations into the first stage model as in (10.7), we arrive at a model of the form Y (^) i = Xiβ + Zibi + ei, (10.8)

where the matrices Xi and Zi are determined by the particular definitions of Ai, Bi, and Ci.

RESULT: It should be clear that it is possible to represent even fancier specifications in this way. E.g., we could also incorporate association of the intercepts with ai, and we may have more than one covariate in the second-stage population model. We consider an example at the end of this chapter. Once we write down the model in the form βi = Aiβ + Bibi for appropriately defined matrices Ai and Bi reflecting the features of interest, we may write a model of the form (10.8), where the definitions of Xi and Zi are dictated by the form of the first- and second-stage models.

THE SIMPLEST MODEL: It is in fact the case that the general model

Y (^) i = Xiβ + Zibi + ei

includes as special cases may simple models for repeated measurements.

A particularly simple model is as follows. Suppose there is only one group, and, for each unit, we have repeated measurements Yij. However, suppose that these measurements are not necessarily over time; e.g. the m units are mother rats, and for the ith mother, Yij represent birthweights of her ni pups. In the absence of further information, a very simple model for this situation is

Yij = μ + bi + eij , j = 1,... , ni. (10.9)

The model says that the population of all possible pup weights is centered about μ, and allows for the possibility of 2 sources of variation, among mother rats, through bi (some mothers have larger pups than others) and within mother rats, through eij (pups born to a given mother are not all identical, and weights may be measured with error).

If we define Xi = 1 , Zi = 1 , and bi = bi, then it is straightforward to see that we may write (10.9) in the form of (10.8).

It is straightforward to extend this simple model to allow different treatment groups with mean μ= μ + τ for the `th group by redefining β and Xi (try it!).

In fact, the univariate ANOVA model of Chapter 5 can also be written in this form. Recall that in Chapter 5 (see page 119) we wrote this model in the form

Y (^) i = Xiβ + 1 bi + ei

Thus, we see this is again a special case of the general model as above (Zi = 1 , bi = bi) with the particular forms of Xi and β on page 119.

SUMMARY: It should be clear from these examples that it is possible to consider a wide variety of subject-specific models of the form

Y (^) i = Xiβ + Zibi + ei

by suitably defining Xi, β, Zi, and bi. This model in its general form is known as the linear mixed effects model.

bi ∼ Nk( 0 , D). Here, D is a (k × k) covariance matrix that characterizes variation due to among- unit sources, assumed the same for all units. The dimension of D corresponds to the number of among-unit random effects in the model. It is possible to allow D to have a particular form or to be unstructured. It is also possible to have different D matrices for different groups, as we discussed in the last chapter. In our discussion here, we will present things under the assumption of a common D for all units, regardless of group or anything else. This may often be a reasonable assumption unless there is strong evidence that different conditions have a nonnegligible effect on variation as well as mean. Much of what we discuss in the sequel can be extended to more complex models, e.g., with different D matrices and fancier Ri matrices.
With these assumptions, we have

E(Y (^) i) = Xiβ, var(Y (^) i) = ZiDZ′ i + Ri = Σi

Y (^) i ∼ Nni (Xiβ, Σi). (10.11) That is, the model with the above assumptions on ei and bi implies that the Y (^) i are multivariate normal random vectors of dimension ni with a particular form of covariance matrix. The form of Σi implied by the model has two distinct components, the first having to do with variation solely from among-unit sources and the second having to do with variation solely from within-unit sources.

“SUBJECT-SPECIFIC” MODEL: Although the forms of Xi, β, Zi, and bi are allowed more possibil- ities here than in the random coefficient model, the spirit of the model is the same. If we think about the general form of the model, it is clear that the model is a subject-specific one. In particular, if we examine the form of the model Y (^) i = Xiβ + Zibi + ei,

If we “zero in” on unit i, and consider this unit alone and in its own right, regardless of other units, the model has the form of a “regression model” for the data Y (^) i. The “mean” part of this regression model is Xiβ + Zibi =

( Xi Zi

)   β bi

 .

The vector ei characterizes random variation associated with within-unit sources. This way of writing this part of the model highlights the fact that individual unit behavior is being charac- terized by some combination of β, which describes the mean for the population, and bi, which describes how this particular unit deviates from the population mean.

Thus, the model may be thought of as subject-specific; as it incorporates the behavior of the individual unit.
We will focus on individual behavior shortly; in particular, we will be more formal about the notion of the unit’s “own mean.”

10.4 Inference on regression and covariance parameters

As in the previous chapter, once we note that the model implies (10.11), the methods of maximum likelihood and restricted maximum likelihood may be used to estimate the parameters that char- acterize the “mean” or systematic part of the model, β, and those that characterize the “variation” or random part of the model, the distinct parameters that make up Ri and D. Thus, the methods and considerations discussed in the previous two chapters apply exactly as described:

The generalized least squares estimator for β and its large sample approximate sampling distribution will have the same form, with Xi and Σi as defined in the model.
Computation of estimated standard errors, Wald and likelihood ratio tests is as before.
The “subject-specific” versus “population-averaged” interpretations of the model both apply.
When the data are balanced in the sense that the times of observation are all the same and the matrices Zi are the same for all units, then when σ^2 In, the GLS and OLS estimators yield the same numerical value. As before, however, the estimated approximate covariance matrices of the estimators will be different; that based on the OLS analysis will be incorrect, because it will not take proper account of the nature of variation for the data vectors Y (^) i. (Recall that the OLS estimator just assumes that all the Yij are independent, so that Σi = I for all i.) The estimated covariance matrix ̂V (^) β for β̂ , which does take variation into account, requires estimates of the components of Ri and D.

Because we have already discussed these issues in detail in earlier chapters, we do not need to do so again here. See section 9.3 and chapter 8 for more.

By analogy, one’s first thought for prediction of bi would be to use the mean of the population of bi. However,

An assumption of the model is that bi ∼ Nk( 0 , D), so that E(bi) = 0 for all i.
Thus, following this logic, we would use 0 as the prediction for bi for any unit. This would lead to the same “estimate” for individual-specific quantities like βi in a random coefficient model for all units.
But the whole point is that individuals are different; thus, this tactic does not seem sensible, as it gives the same result regardless of individual!

Thus, simply using the mean of the population of random effects bi will not provide a useful result. Something that preserves the “individuality” of the bi is needed instead.

Another thing to note is that this approach does not at all take advantage of the fact that we have some additional information available – the data! Under the model, we have Y (^) i = Xiβ + Zibi + ei; that is, the data Y (^) i and the underlying random effects bi are related. This suggests that there must be information about bi in Y (^) i that we could exploit. In particular, is there some sensible function of the data Y (^) i that could be used as a predictor for bi? Of course, this function would also be random, as it is a function of the random data Y (^) i.

CONDITIONAL EXPECTATION: To make the discussion a little easier, we will assume for the moment that bi is a scalar; i.e. k = 1. The same reasoning goes through for k > 1. Call this scalar random effect bi.

For our predictor, we’d like something that is “close to” bi. If we let c(Y (^) i) be the function of the data we will use as the predictor, then one possibility would be to say we’d like to choose c(Y (^) i) so that distance between c(Y (^) i) and bi, which we can measure as

{bi − c(Y (^) i)}^2 ,

is “small.” This makes sense – we’d like to use as a predictor something that resembles bi in some sense.

As both Y (^) i and bi are random, and hence vary in the population, we’d like the distance to be “small” considered over all possible values they might take on. Thus, it seems reasonable to consider the expectation of this distance, averaging it over all possible values; i.e.

E{bi − c(Y (^) i)}^2 (10.12)

How “small” is “small?” A natural way to think is that we’d like the function c(Y (^) i) we use to be the function that makes (10.12) as small as possible; that is, the function c(Y (^) i) we’d like to choose is the one that minimizes E{bi − c(Y (^) i)}^2 across all possible functions we might choose.

The particular function c(Y (^) i) that minimizes this expected distance is called the conditional expectation of bi given Y (^) i. The usual notation is to write the conditional expectation as

E(bi|Y (^) i). (10.13)

The conditional expectation is itself a random quantity; it is a function of the random vector Y (^) i. Thus, do not be confused into thinking it is a fixed quantity because of the notation – the “E” is being used in a different way.
This definition may be extended to the case where bi is a vector.

CONDITIONAL EXPECTATION AND MULTIVARIATE NORMALITY: It turns out that when Y (^) i and bi are both normally distributed, it is possible to find an explicit expression for the conditional expectation. We first discuss this in detail in a special case: the simplest form of the linear mixed model given in equation (10.9), where bi is a scalar bi:

Yij = μ + bi + eij

with Y (^) i = (Yi 1 ,... , Yini )′, ei = (ei 1 ,... , eini )′, bi ∼ N (0, D), and ei ∼ Nni ( 0 , σ^2 I). It of course follows that Yij ∼ N (μ, D + σ^2 ) (verify).

It may be shown that, under this model,

E(bi|Y (^) i) = (^) niDni D+ σ 2 (Y (^) i − μ), (10.14)

where Y (^) i is the mean of the ni Yij values in Y (^) i.

If ω is known, then Σi is known, and in this case the maximum likelihood estimator for μ is the weighted least squares estimator [see equation (8.17)], which in our case (X (^) i = (^1) ni ) is

μˆ =

( (^) ∑m i=

1 ′ ni Σ− i 11 ni

)− (^1) ∑m i=

1 ′ ni Σ− i 1 Y (^) i,

which may be shown to lead to the result that

μˆ =

∑m ∑i=1m(niD^ +^ σ^2 )−^1 Y^ i i=1(niD^ +^ σ^2 )−^1

(Try it – you will need to use the matrix fact that

Σ− i 1 = (^) σ^12

( Ini − (^) σ (^2) +D niD Jni

)

in your calculation.) Note that ˆμ is a linear function of the data Yij (through Y (^) i).

Thus, under these “ideal” conditions, to calculate the predictor for practical use, we would sub- stitute ˆμ for μ in the conditional expectation to arrive at niD niD + σ^2 (Y^ i^ −^ ˆμ).^ (10.16) Note that (10.16) is still a linear function of the data through Y (^) i.
It may be shown that, if we calculate the variance of (10.16), it is smaller than the variance of any other linear function of Y (^) i we might use to predict bi. That is, the “estimated” predictor (10.16) is the least variable among all predictors we might have chosen that are linear functions of the data. Thus, it is “best” in the sense that it exhibits the least variability, so is most reliable as a predictor.
The predictor (10.16) under these “ideal” conditions is also unbiased in the same sense described above – if we find its expectation, it is still equal to 0 even with ˆμ substituted for μ (try it!).
As a result, the predictor (10.16) is referred to as the Best Linear Unbiased Predictor for bi. The popular acronym is BLUP.

Now, of course, in real life, the elements of ω are not known; rather, they are estimated. Thus, instead of the “ideal” WLS estimator (10.15), we must use the generalized least squares estimator for μ which has the same form as the WLS estimator but depends on Σ̂ i, which is Σi with the ML or REML estimates D̂ and ̂σ^2 plugged in. Moreover, these estimates must be plugged into the rest of the form of the predictor. Thus, in practice, one uses as the predictor ̂ bi = ni^ D̂ ni D̂ + ̂σ^2 (Y^ i^ −^ μ̂),^ (10.17) where μ̂ is the GLS estimator μ̂ =

∑m ∑i=1m(ni^ D̂^ +^ ̂σ^2 )−^1 Y^ i i=1(ni^ D̂^ +^ ̂σ^2 )−^1

The symbol ̂bi is used to denote this predictor.

Because we have plugged in these estimates, the properties of unbiasedness and smallest vari- ance no longer hold exactly. However, it is hoped that they hold at least approximately. Thus, the predictor (10.17) used in practice is usually also referred to as BLUP, although this is not precisely true anymore. Another common term is empirical Bayes estimator for bi, which comes from another interpretation of the BLUP we will not discuss here.

“ESTIMATION” OF INDIVIDUAL “MEAN”: Recall our earlier observation for the general model that, if we “zero in” on a particular individual, we may think of them as having their own “regression model” with individual-specific “mean” Xiβ + Zibi. In our simple model here, this “mean” is (^1) ni μ + (^1) ni bi, which implies that the “mean” for the jth observation is

μi = μ + bi

for all j = 1,... , ni. An important goal of predicting bi is to allow us to characterize the individual- specific “mean” for each unit.

We may in fact formalize this. We have been saying that μi = μ + bi is the “mean” for individual i. Technically, μi is the conditional expectation of Y (^) i, the data for unit i, given bi. That is, μi is the function of bi that is “closest” to Y (^) i. For the jth observation, this is written

μi = E(Yij |bi).

Heuristically, we may thus think of μi as the “mean” of Yij were we lucky enough to know bi.

Random Coefficient Model for Distance Based on Gender and Age - Prof. Marie Davidian, Study notes of Statistics

Related documents

Partial preview of the text

Download Random Coefficient Model for Distance Based on Gender and Age - Prof. Marie Davidian and more Study notes Statistics in PDF only on Docsity!

10 Linear mixed effects models for multivariate normal data

10.1 Introduction

10.2 Examples

•

(^) •

•

•

•^

(^) •

10.4 Inference on regression and covariance parameters