




















































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Sas code for conducting a random coefficient model analysis on a dataset named dent1 to examine the relationship between distance and gender, as well as the effect of age on this relationship for each gender. The model includes both random intercepts and slopes for the subject variable 'child'. The output includes fixed effects, random effects, and reconfigured datasets for further analysis.
Typology: Study notes
1 / 60
This page cannot be seen from the preview
Don't miss anything!





















































Random coefficient models, where we develop an overall statistical model by thinking first about indi- vidual trajectories in a “subject-specific” fashion, are a special case of a more general model framework based on the same perspective. This model framework, known popularly as the linear mixed effects model, is still based on thinking about individual behavior first, of course. However, the possibilities for how this is represented, and how the variation in the population is represented, are broadened. The result is a very flexible and rich set of models for characterizing repeated measurement data.
The broader possibilities that are encompassed are best illustrated by examples. In the next section, we consider several examples that highlight some of these possibilities. We then note that all of the examples, as well as the random coefficient model as described in the last chapter, may be written in a unified way. Moreover, the same inferential techniques of maximum likelihood and restricted maximum likelihood are also applicable.
As mentioned in our discussion of random coefficient models, one advantage is that the model naturally represents individual trajectories in a formal way, so that questions of interest about individual behavior may be considered. In this chapter, we will show in the context of the general linear mixed effects model framework how “estimation” of individual trajectories may carried out.
RANDOM COEFFICIENT MODEL: To set the stage, recall the random coefficient model where each unit is assumed to have its own inherent straight line trajectory, with its own intercept and slope β 0 i and β 1 i, i.e.
Yij = β 0 i + β 1 itij + eij , βi =
β^0 i β 1 i
.
If furthermore units are from, say, q = 2 groups, then the population model would be
βi = Aiβ + bi, bi ∼ N ( 0 , D),
β =
β 01 β 11 β 02 β 12
, bi =
b^0 i b 1 i
and Ai is the appropriate matrix of 0’s and 1’s that “picks off” the intercept and slope for the group to which i belongs. If there is only q = 1 group, then Ai = I 2 for all i and β = (β 0 , β 1 )′.
MAGNITUDES OF AMONG-UNIT VARIATION: For simplicity, consider first a situation with a sin- gle group, so that all β 0 i and β 1 i in the random coefficient model are assumed to vary about a common mean intercept and slope. Consider Figure 1, which depicts longitudinal data for 10 hypothetical units.
Figure 1: Longitudinal data where variation in slope may be negligible
days
response
0 5 10 15 20 25 30
20
40
60
80
100
120
140
- (^) • - - - (^) • - - - (^) • PSfrag replacements
μ σ^21 σ^22 ρ 12 = 0. 0 ρ 12 = 0. 8
y 1 y 2
If we believed that the second possibility were likely, we might still want to consider model (10.1). If we considered the usual random coefficient model with
β 0 i = β 0 + b 0 i β 1 i = β 1 + b 1 i,
then for the matrix D, the D 11 , represents the variance of b 0 i (among intercepts) and D 22 that of b 1 i (among slopes). If D 11 is nonnegligible relative to the mean intercept, then this suggests that intercepts vary perceptibly. If on the other hand D 22 is virtually negligible relative to the size of the mean slope, then this suggests that variation in slopes is almost undetectable.
In either case, we are faced with a situation that does not quite fit into the random coefficient framework. The individual-specific parameters βi no longer have all elements varying! How may we represent this? This is most easily seen by “brute force.” We have
Yij = β 0 i + β 1 itij + eij ,
β 0 i = β 0 + b 0 i, β 1 i = β 1. (10.2)
Plugging the representations for β 0 i and β 1 i into the first stage model, we obtain
Yij = β 0 + β 1 tij + b 0 i + eij. (10.3)
If we think of the implication of (10.3) for the entire vector Y (^) i, it is straightforward to see that we may write this succinctly as Y (^) i = Xiβ + 1 b 0 i + ei,
where as usual 1 is a (ni × 1) vector of 1’s and Xi is the design matrix for individual i
Xi =
1 ti 1 ... ... 1 tini
Note that if we let Zi = 1 and bi = b 0 i (1 × 1), we may write this in the form
Y (^) i = Xiβ + Zibi + ei (10.4)
as before – this looks identical to the general representation we used in the last chapter, except that the definitions of Xi and Zi we used in the single group case are now different. Other than this, the model has exactly the same form, once we’ve defined Xi and Zi appropriately.
Alternatively, we can do the same calculation with more fancy footwork. We will illustrate this in a way that allows immediate extension to the case of more than one group; to this end, it is convenient to use a different symbol to represent the design matrix for individual i (we called it X (^) i above). Thus, write
Ci =
1 ti 1 ... ... 1 tini
.
Furthermore, note that we may write (10.2) as follows (verify)
βi = Aiβ + Bibi, bi = b 0 i (1 × 1), (10.5)
where Ai is an identity matrix and
Bi =
1 0
, (2 × 1).
With these representations, if we think of the model that says each child has his/her own straight line regression model with child-specific regression parameter βi, i.e.
Y (^) i = Ciβi + ei,
plugging (10.5) into this expression gives
Y (^) i = CiAiβ + CiBibi + ei. (10.6)
To gain a further understanding of this, consider another possibility.
OTHER COVARIATES: In some instances, the question of interest may in fact involve the possible association between the values of measured covariates and rate of change of a response over time. We now see that it is possible to write models appropriate for this situation in the form (10.4) for suitable choices of Xi and Zi.
An example arises in understanding the progression of disease in HIV-infected patients assigned to follow a certain therapeutic regimen. HIV attacks the immune system, so HIV-infected subjects often have compromised immune system characteristics. A standard measure of immune status is CD4 count, where lower counts indicate poorer status. Now a standard measure of how well a patient is doing is viral load, roughly the “amount” of virus present in the body, and it is routine to follow viral load over time to monitor a patient’s well-being. HIV scientists may be interested in whether the nature of viral load progression is different depending on a subject’s immune system at the time of initiation of therapy. To develop a formal model to address this issue, suppose initially there is only one group.
β 1 i = β 2 + β 3 ai + b 1 i.
β 0 i = β 1 + b 0 i.
We may write this succinctly as
βi = Aiβ + bi, β =
β 1 β 2 β 3
,^ bi^ =
b^0 i b 1 i
, Ai =
1 0 0 0 1 ai
It is straightforward to see that this model may be put into the form of (10.4). Plugging in the form of βi into the individual model, we see that
Yij = β 1 + β 2 tij + β 3 aitij + b 0 i + b 1 itij + eij , j = 1,... , ni.
It may be verified that this may be written succinctly as
Y (^) i = Xiβ + Zibi + ei,
where
Xi =
1 ti 1 aiti 1 ... ... ... 1 tini aitini
, Zi =
1 ti 1 ... ... 1 tini
= Ci, say.
β 0 i = β 1 + b 0 i for treatment 1, = β 4 + b 0 i for treatment 2, β 1 i = β 2 + β 3 ai for treatment 1, = β 5 + β 6 ai for treatment 2,
We could again write this as βi = Aiβ + Bibi with Ai and β as above but with bi = b 0 i and Bi = (1, 0)′.
By plugging these representations into the first stage model as in (10.7), we arrive at a model of the form Y (^) i = Xiβ + Zibi + ei, (10.8)
where the matrices Xi and Zi are determined by the particular definitions of Ai, Bi, and Ci.
RESULT: It should be clear that it is possible to represent even fancier specifications in this way. E.g., we could also incorporate association of the intercepts with ai, and we may have more than one covariate in the second-stage population model. We consider an example at the end of this chapter. Once we write down the model in the form βi = Aiβ + Bibi for appropriately defined matrices Ai and Bi reflecting the features of interest, we may write a model of the form (10.8), where the definitions of Xi and Zi are dictated by the form of the first- and second-stage models.
THE SIMPLEST MODEL: It is in fact the case that the general model
Y (^) i = Xiβ + Zibi + ei
includes as special cases may simple models for repeated measurements.
A particularly simple model is as follows. Suppose there is only one group, and, for each unit, we have repeated measurements Yij. However, suppose that these measurements are not necessarily over time; e.g. the m units are mother rats, and for the ith mother, Yij represent birthweights of her ni pups. In the absence of further information, a very simple model for this situation is
Yij = μ + bi + eij , j = 1,... , ni. (10.9)
The model says that the population of all possible pup weights is centered about μ, and allows for the possibility of 2 sources of variation, among mother rats, through bi (some mothers have larger pups than others) and within mother rats, through eij (pups born to a given mother are not all identical, and weights may be measured with error).
If we define Xi = 1 , Zi = 1 , and bi = bi, then it is straightforward to see that we may write (10.9) in the form of (10.8).
It is straightforward to extend this simple model to allow different treatment groups with mean μ= μ + τ for the `th group by redefining β and Xi (try it!).
In fact, the univariate ANOVA model of Chapter 5 can also be written in this form. Recall that in Chapter 5 (see page 119) we wrote this model in the form
Y (^) i = Xiβ + 1 bi + ei
Thus, we see this is again a special case of the general model as above (Zi = 1 , bi = bi) with the particular forms of Xi and β on page 119.
SUMMARY: It should be clear from these examples that it is possible to consider a wide variety of subject-specific models of the form
Y (^) i = Xiβ + Zibi + ei
by suitably defining Xi, β, Zi, and bi. This model in its general form is known as the linear mixed effects model.
E(Y (^) i) = Xiβ, var(Y (^) i) = ZiDZ′ i + Ri = Σi
Y (^) i ∼ Nni (Xiβ, Σi). (10.11) That is, the model with the above assumptions on ei and bi implies that the Y (^) i are multivariate normal random vectors of dimension ni with a particular form of covariance matrix. The form of Σi implied by the model has two distinct components, the first having to do with variation solely from among-unit sources and the second having to do with variation solely from within-unit sources.
“SUBJECT-SPECIFIC” MODEL: Although the forms of Xi, β, Zi, and bi are allowed more possibil- ities here than in the random coefficient model, the spirit of the model is the same. If we think about the general form of the model, it is clear that the model is a subject-specific one. In particular, if we examine the form of the model Y (^) i = Xiβ + Zibi + ei,
( Xi Zi
) β bi
.
The vector ei characterizes random variation associated with within-unit sources. This way of writing this part of the model highlights the fact that individual unit behavior is being charac- terized by some combination of β, which describes the mean for the population, and bi, which describes how this particular unit deviates from the population mean.
As in the previous chapter, once we note that the model implies (10.11), the methods of maximum likelihood and restricted maximum likelihood may be used to estimate the parameters that char- acterize the “mean” or systematic part of the model, β, and those that characterize the “variation” or random part of the model, the distinct parameters that make up Ri and D. Thus, the methods and considerations discussed in the previous two chapters apply exactly as described:
Because we have already discussed these issues in detail in earlier chapters, we do not need to do so again here. See section 9.3 and chapter 8 for more.
By analogy, one’s first thought for prediction of bi would be to use the mean of the population of bi. However,
Thus, simply using the mean of the population of random effects bi will not provide a useful result. Something that preserves the “individuality” of the bi is needed instead.
Another thing to note is that this approach does not at all take advantage of the fact that we have some additional information available – the data! Under the model, we have Y (^) i = Xiβ + Zibi + ei; that is, the data Y (^) i and the underlying random effects bi are related. This suggests that there must be information about bi in Y (^) i that we could exploit. In particular, is there some sensible function of the data Y (^) i that could be used as a predictor for bi? Of course, this function would also be random, as it is a function of the random data Y (^) i.
CONDITIONAL EXPECTATION: To make the discussion a little easier, we will assume for the moment that bi is a scalar; i.e. k = 1. The same reasoning goes through for k > 1. Call this scalar random effect bi.
For our predictor, we’d like something that is “close to” bi. If we let c(Y (^) i) be the function of the data we will use as the predictor, then one possibility would be to say we’d like to choose c(Y (^) i) so that distance between c(Y (^) i) and bi, which we can measure as
{bi − c(Y (^) i)}^2 ,
is “small.” This makes sense – we’d like to use as a predictor something that resembles bi in some sense.
As both Y (^) i and bi are random, and hence vary in the population, we’d like the distance to be “small” considered over all possible values they might take on. Thus, it seems reasonable to consider the expectation of this distance, averaging it over all possible values; i.e.
E{bi − c(Y (^) i)}^2 (10.12)
How “small” is “small?” A natural way to think is that we’d like the function c(Y (^) i) we use to be the function that makes (10.12) as small as possible; that is, the function c(Y (^) i) we’d like to choose is the one that minimizes E{bi − c(Y (^) i)}^2 across all possible functions we might choose.
The particular function c(Y (^) i) that minimizes this expected distance is called the conditional expectation of bi given Y (^) i. The usual notation is to write the conditional expectation as
E(bi|Y (^) i). (10.13)
CONDITIONAL EXPECTATION AND MULTIVARIATE NORMALITY: It turns out that when Y (^) i and bi are both normally distributed, it is possible to find an explicit expression for the conditional expectation. We first discuss this in detail in a special case: the simplest form of the linear mixed model given in equation (10.9), where bi is a scalar bi:
Yij = μ + bi + eij
with Y (^) i = (Yi 1 ,... , Yini )′, ei = (ei 1 ,... , eini )′, bi ∼ N (0, D), and ei ∼ Nni ( 0 , σ^2 I). It of course follows that Yij ∼ N (μ, D + σ^2 ) (verify).
It may be shown that, under this model,
E(bi|Y (^) i) = (^) niDni D+ σ 2 (Y (^) i − μ), (10.14)
where Y (^) i is the mean of the ni Yij values in Y (^) i.
μˆ =
( (^) ∑m i=
1 ′ ni Σ− i 11 ni
)− (^1) ∑m i=
1 ′ ni Σ− i 1 Y (^) i,
which may be shown to lead to the result that
μˆ =
∑m ∑i=1m(niD^ +^ σ^2 )−^1 Y^ i i=1(niD^ +^ σ^2 )−^1
(Try it – you will need to use the matrix fact that
Σ− i 1 = (^) σ^12
( Ini − (^) σ (^2) +D niD Jni
)
in your calculation.) Note that ˆμ is a linear function of the data Yij (through Y (^) i).
∑m ∑i=1m(ni^ D̂^ +^ ̂σ^2 )−^1 Y^ i i=1(ni^ D̂^ +^ ̂σ^2 )−^1
The symbol ̂bi is used to denote this predictor.
“ESTIMATION” OF INDIVIDUAL “MEAN”: Recall our earlier observation for the general model that, if we “zero in” on a particular individual, we may think of them as having their own “regression model” with individual-specific “mean” Xiβ + Zibi. In our simple model here, this “mean” is (^1) ni μ + (^1) ni bi, which implies that the “mean” for the jth observation is
μi = μ + bi
for all j = 1,... , ni. An important goal of predicting bi is to allow us to characterize the individual- specific “mean” for each unit.
μi = E(Yij |bi).
Heuristically, we may thus think of μi as the “mean” of Yij were we lucky enough to know bi.