Statistical Modeling for Different Genders: SAS Code and Output - Prof. Marie Davidian, Study notes of Statistics

Sas code and output for a statistical model that allows for the fact that children are of different genders. The model includes separate variances for each gender and different covariance matrices. The output includes the log likelihood, aic, aicc, and bic values, as well as estimates and tests for fixed effects.

Typology: Study notes

Pre 2010

Uploaded on 03/18/2009

koofers-user-aik
koofers-user-aik 🇺🇸

9 documents

1 / 54

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
CHAPTER 9 ST 732, M. DAVIDIAN
9 Random coefficient models for multivariate normal data
9.1 Introduction
In the last chapter, we noted that an alternative perspective on explicit modeling of longitudinal response
is to think directly of the fact that each unit appears to have its own trajectory or inherent trend
with its own peculiar features. For example, in the dental study, if we focus on a particular child, the
trajectory looks to be approximately like a straight line (with some variation about it, of course). The
data are reproduced below for convenience in Figure 1. A similar statement could be made about the
dialyzer data in the last chapter.
Figure 1: Dental data revisited.
age (years)
distance (mm)
8 9 10 11 12 13 14
20 25 30
0 0
0
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
0
0
0
0
0
0
0
0
0
0
01
11
1
1
1
1
1
1
1
1
1
1
1
1
1 0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
1
1
11
11
1
1
1
1
1
1 0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
Dental Study Data
PSfrag replacements
µ
σ2
1
σ2
2
ρ12 = 0.0
ρ12 = 0.8
y1
y2
The general regression modeling approach takes the standard perspective in much of statistical modeling
of focusing directly on the mean responses and how they change over time. In this chapter, we consider
an alternative approach to building a model based on thinking first about individual trajectories.
For trajectories that may be represented by linear functions of a design matrix and parameters,
this approach will lead us to the same type of mean models as the general regression approach.
PAGE 309
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30
pf31
pf32
pf33
pf34
pf35
pf36

Partial preview of the text

Download Statistical Modeling for Different Genders: SAS Code and Output - Prof. Marie Davidian and more Study notes Statistics in PDF only on Docsity!

9 Random coefficient models for multivariate normal data

9.1 Introduction

In the last chapter, we noted that an alternative perspective on explicit modeling of longitudinal response is to think directly of the fact that each unit appears to have its own trajectory or inherent trend with its own peculiar features. For example, in the dental study, if we focus on a particular child, the trajectory looks to be approximately like a straight line (with some variation about it, of course). The data are reproduced below for convenience in Figure 1. A similar statement could be made about the dialyzer data in the last chapter.

Figure 1: Dental data revisited.

age (years)

distance (mm)

8 9 10 11 12 13 14

20

25

30

(^000)

0 0 0

0

0

0

0

0

1

1

1

1

1

1

1

1 1

1

1 1

1

(^111)

0

0

00 0 0

00 0 0

10

11

1

1

1

(^11) 1

1

11

1

1 1

1 0

00

0

0 0

00 0

0

0

1

1

1

1

1

1

11

11

11

111

(^1 )

00

0

0 0

0 0

0 0

0

1

1

(^11) 1

1 1 11

1

1

1

1

1

1

1

Dental Study Data

PSfrag replacements

μ σ^21 σ^22 ρ 12 = 0. 0 ρ 12 = 0. 8

y 1 y 2

The general regression modeling approach takes the standard perspective in much of statistical modeling of focusing directly on the mean responses and how they change over time. In this chapter, we consider an alternative approach to building a model based on thinking first about individual trajectories.

  • For trajectories that may be represented by linear functions of a design matrix and parameters, this approach will lead us to the same type of mean models as the general regression approach.
  • However, the modeling approach acknowledges explicitly the two separate sources of variation we have discussed. As a result, it “automatically” leads to covariance models that also acknowledge these sources.
  • The resulting statistical model, called a random coefficient model for reasons that will be clear shortly, will be seen to imply a a model like the general linear regression models of the last chapter with a particular covariance structure for each data vector. Thus, the inferential methods of that chapter, namely maximum and restricted maximum likelihood, will apply immediately.
  • In addition, this modeling strategy will allow us to address questions of scientific interest about trajectories for individual units, either ones in the study or future units. For example, in a study of AIDS patients, it may be of interest to physicians attending the patients to have an estimate of a patient’s individual apparent trajectory, so that they may make clinical decisions about his or her future care. There is no apparent way of doing this in the general modeling approach we have just considered.

9.2 Random coefficient model

SUBJECT-SPECIFIC TRAJECTORY: Recall the conceptual model discussed in Chapter 4. For def- initeness, again consider the dental study data. We take the view that each child has his/her own underlying straight line inherent trend. Focusing on the ith child, this says that s/he has his/her own intercept and slope, β 0 i and β 1 i, say, respectively, that determine this trend. This intercept and slope are unique to child i.

WITHIN-INDIVIDUAL VARIATION: Continuing with conceptual perspective, the actual responses observed for a given child do not fall exactly on a straight line (the inherent trajectory) due to

  • The fact that the response cannot be measured perfectly, but is instead subject to measurement error due to the measuring device.
  • Individual “fluctuations;” although the overall trend for a given child is a straight line, the actual responses, if we could observe them continuously over time, tend to fluctuate about the trend.

AMONG-INDIVIDUAL VARIATION: The inherent trajectories are “high” or “low” with different steepness across children, suggesting that the child-specific intercepts β 0 i and slopes β 1 i vary across children.

  • It is natural to think of this population as being “centered” about a “typical” value of intercept and slope, with variation about this center value – some children have shallower or steeper slopes, for example.
  • More formally, we may think of the mean value of intercept and slope of the population of all such βi vectors. Individual intercept/slope vectors vary about this mean. Thus, we may think of a joint probability distribution of all possible values that a random vector of regression parameters βi could take on. More on this momentarily.

This way of thinking suggests a model for this population as follows. Let β 0 and β 1 represent the mean values of intercept and slope, and define

β =

  β^0 β 1

 . (9.3)

Thus β is the mean vector of the population of all βi. Then write

βi = β + bi, bi =

  b^0 i b 1 i

  , (9.4)

which is a shorthand way of saying

β 0 i = β 0 + b 0 i, β 1 i = β 1 + b 1 i.

  • Here, bi is a vector of random effects describing how the intercept and slope for the ith child deviates from the mean value.
  • Thus, (9.4) has the flavor of a regression-type model for the child-specific regression parameters, with a systematic component, the mean, and a random component summarizing how things vary about it.
  • More formally, the vectors bi are assumed to have mean 0 and some covariance matrix that describes the nature of this variation – how intercepts and slopes vary among children and how they covary (e.g. do large intercepts and slopes tend to occur together?) In fact, as we discuss shortly, the bi are assumed to have a multivariate probability distribution with this mean and covariance matrix.
  • Thus, whereas the individual child model summarizes how things happen within a child, this model characterizes variation among children, representing the population through intercepts and slopes. Putting the models (9.1) and (9.4) together thus gives a complete description of what we believe about each child and the population of children, acknowledging the two sources of variation explicitly.
  • Note that we may substitute the expressions for β 0 i and β 1 i in (9.1) to obtain Yij = (β 0 + b 0 i) + (β 1 + b 1 i)tij + eij. This shows clearly what we are assuming: each child has intercept and slope that varies about the “typical,” or mean intercept and slope β 0 and β 1.

ACKNOWLEDGING GENDER: We can refine our model to allow for the fact that children are of different genders as follows. We may think of children as coming from two populations, males and females, each population with its own mean values of intercept and slope and possibly different pattern of variation in these intercepts and slopes. Each child would still have his/her own individual regression model as in (9.1), so this would not change. What would change to incorporate this refinement is the population model. For example, if child i is a boy, then we might believe

β 0 i = β 0 ,B + b 0 i. β 1 i = β 1 ,B + b 1 i,

while if i is a girl, β 0 i = β 0 ,G + b 0 i. β 1 i = β 1 ,G + b 1 i.

  • Here, the fixed parameters β 0 ,B , β 1 ,B represent the mean intercept and slope for boys; similarly, β 0 ,G, β 1 ,G represent the same for girls.
  • bi = (b 0 i, b 1 i)′^ represents the random effect for child i with mean 0 We may believe that the populations of βi for boys and girls have different means but have similar variation. In this case, we might say that the bi all have the same covariance matrix regardless of whether i is a boy or girl. On the other hand, if we believe that the populations have different variation, we might think of the bi of being of two types, with a different covariance matrix depending on the gender. We will be more formal shortly.
  • Let β =

    

β 0 ,G β 1 ,G β 0 ,B β 1 ,B

    

In this case, it is reasonable to assume that var(e 1 i) is a diagonal matrix. If we furthermore believe that the magnitude of fluctuations is similar across time and units, we may represent this by the assumption that var(e 1 ij ) = σ 12 , say, for all i and j, so that

var(e 1 i) = σ^21 Ini.

The assumption that this is similar across units may be viewed as reflecting the belief that the e 1 ij are independent of βi and hence bi, which dictate how “large” the unit-specific trend is, so that the magnitude of fluctuations is unrelated to any unit-specific response characteristics.

  • As we have discussed previously, it may be reasonable to assume that errors in measurement are uncorrelated over time; thus, taking var(e 2 i) to be a diagonal matrix would be appropriate. Suppose we also believe that errors committed by the measuring device are of similar magnitude regardless of the true size of the thing being measured, and are similar for all units (because the same device is used). This suggests that var(e 2 ij ) = σ^22 , say, for all j, so that

var(e 2 i) = σ^22 Ini.

Now the true size of the thing being measured at time tij is

β 0 i + β 1 itij + e 1 ij ;

i.e. the actual response uncontaminated by measurement error. Under this belief, it is reasonable to assume that the e 2 ij are independent of βi and thus bi.

  • Putting this together, we would take

Ri = var(ei) = var(e 1 i) + var(e 2 i) = σ^21 Ini + σ^22 Ini = σ^2 Ini ,

where σ^2 is the aggregate variance reflecting variation due to both within-unit sources.

  • The assumption that e 1 i and e 2 i are independent is standard, as is the assumption that e 1 i and e 2 i (and hence ei) are independent of bi. We say more about these assumptions shortly.
  • We may think of other situations. For example, suppose that the response is something like height, which in all likelihood we can measure with very little if any error. Under this condition, we may effectively eliminate e 2 i from the model and assume that ei = e 1 i; i.e. all within-unit variation is due to things like “fluctuations.” In the model above, σ^2 = σ^21 would then represent the variance due to this sole source.
  • Similarly, we may have a rather “noisy” measuring device such that, relative to errors in mea- surement, deviations due to within-unit subjects are virtually negligible. Under this condition, as long as we believe the times are far enough apart to render within-unit correlation negligible as well, we may as well take ei = e 2 i, in which case σ^2 = σ^22 in the above model represents solely measurement error variance.
  • Now suppose that the times of observation are sufficiently close that correlation due to within-unit sources cannot be viewed as negligible. In this event, it would be unreasonable to take var(e 1 i) to be diagonal. It would instead be more realistic to adopt a model for var(e 1 i) that represents correlation that decays as observations become farther apart. For example, with equally-spaced observations and variance assumed constant as above, the AR(1) structure may be a suitable model; i.e.

var(e 1 i) = σ^21

   

1 ρ ρ^2 · · · ρn−^1 ... ... ... ... ... ρn−^1 ρn−^2 · · · ρ 1

   

In general, maintaining the common variance assumption, we might entertain models var(e 1 i) = σ^21 Γi, where Γi is a suitable (ni × ni) correlation matrix.

  • In this case, with the same assumptions on measurement error and independence as above, we would instead have Ri = var(ei) = σ 12 Γi + σ^22 Ini. (9.6) If measurement error were deemed negligible, this would be reduced to the assumption that

Ri = σ^2 Γi,

where σ^2 = σ^22 represents variance due solely to within-unit fluctuations.

  • We could also modify the above models to incorporate the possibility that, for example, one or both variances changes over time. In this situation, one could postulate a heterogeneous covariance model, as described in Chapter 4. I.e., if we believed fluctuation variances are still similar across subjects but change in magnitude over time, replace the assumption σ^21 Γi above by the heterogeneous version of the correlation matrix.

This sort of assumption is often made unknowingly; the analyst will choose a model for Ri that embodies certain assumptions and emphasizes one source or another by default without having thought about considerations like those above. In fact, the most common assumption is Ri = σ^2 Ini , where σ^2 is the same for all units and groups, is usually made in this way (and is the default in SAS PROC MIXED). We discuss the consequences of a “wrong” model specification for Ri shortly.

  • In general, Ri is a (ni × ni) matrix depending on a few variance and correlation parameters; e.g. σ^2 and ρ in the example above, chosen to at least approximate the anticipated features of within-unit sources of variation and correlation.
  • If we just focus on the response for individual i at any time point tij , if we believe a normal distribution is a reasonable way to represent the population of responses we might see on this individual at tij , then it would make sense to assume that each eij were normally distributed. This of course implies that we assume ei ∼ Nni ( 0 , Ri).

AMONG-UNIT VARIATION: In the “population” model (9.5), the random effects bi have mean 0 and represent variation resulting from the fact that individual units differ; i.e. exhibit biological or other variation. The model says that this variation among individuals manifests itself by causing the individual unit trajectories to be different (have different intercepts and slopes). Thus, var(bi) characterizes this variation.

  • Intercepts and slopes may tend to be large or small together, so that children with steeper slopes tend to “start out” larger at age 0. Alternatively, large intercepts may tend to happen with small slopes and vice versa; perhaps children who “start out” smaller experience a steeper growth pattern to “catch up.” In either case, this suggests that it would not necessarily be prudent to think of var(bi) as a diagonal matrix. Rather, we expect there to be some correlation between intercepts and slopes, the nature of this correlation depending on what is being studied.
  • As noted above, we may believe that the populations of intercept/slopes for boys and girls have possibly different means, but that the variation in each population about the mean is similar. Formally, we can represent this by assuming that var(bi) = D for some covariance matrix D regardless of whether i is a boy or girl.
  • Here, D is (2 × 2), and an unstructured model is really the only one that makes sense. In particular, writing D =

  D^11 D^12 D 12 D 22

  ,

we have

var(β 0 i) = var(b 0 i) = D 11 , var(β 1 i) = var(b 1 i) = D 22 , cov(β 0 i, β 1 i) = cov(b 0 i, b 1 i) = D 12.

It should be clear that we would not expect D 12 = 0 in general; e.g., steep slopes may be associated with “high” intercepts. It should also be clear that D 11 = D 22 would be unrealistic. The intercept is on the same scale of measurement as the response, while the slope is on the scale “response scale per unit time.” Thus, these parameters are representing variances that would be expected to be different because they correspond to phenomena that are on different scales.

  • If we believed that these populations exhibit possibly different variation, we can represent this by assuming that var(bi) = DB if i is a boy, var(bi) = DG if i is a girl, where DB and DG are two (unstructured) covariance matrices.
  • In either case, the assumption on var(bi) reflects solely the nature of variation at the level of the population(s) of units; that is, that caused solely by variation among units due to biology or other features. This is formally represented through the bi.
  • It is often reasonable to assume that populations of intercepts and slopes are approximately normally distributed; e.g. this says that slopes vary symmetrically about the mean, some steeper, some shallower. Thus, a standard assumption is that the bi have a multivariate normal distribution; e.g. in the case where the covariance matrix is assumed the same and equal to D regardless of gender, the assumption would be

bi ∼ Nk( 0 , D),

where k is the dimension of bi (k = 2 here).

SUMMARY: We now summarize the model suggested by these considerations. The model may be thought of as a two-stage hierarchy: For i = 1,... , m,

Stage 1 – individual Y (^) i = Ziβi + ei (ni × 1), ei ∼ Nni ( 0 , Ri) (9.7)

This is like a “regression model” for the ith unit, with “design matrix” Zi and (k × 1) “regression parameter” βi.

Stage 2 – population βi = Aiβ + bi (k × 1), bi ∼ Nk( 0 , D). (9.8)

Here, we have taken var(bi) = D to be the same for all i, and we will continue to do so for definiteness in our subsequent development. However, this could be relaxed as described above, and the features of the model we point out shortly would still be valid. The matrix Ai summarizes information like group membership, allowing the mean of βi to be different for different groups.

Variation in the model is explicitly acknowledged to come from two sources:

  • Due to features within units, represented through the covariance matrix Ri.
  • Due to biological variation among units, represented to the covariance matrix D.
  • This is in marked contrast to the models of the previous chapter. These models required the analyst to think of a single covariance matrix for a data vector, representing the aggregate effect of both sources. The models that are typically used tend to focus on the time-ordered aspect.

IMPLICATION: We now see the contrast with the models of the last chapter more directly. Suppose that we combine two parts of the model into a single representation by substituting the expression for βi in (9.8) into (9.7); i.e.

Y (^) i = Zi(Aiβ + bi) + ei = (ZiAi)β + Zibi + ei.

  • Suppose first that there is only one group, so that Ai = Ik. Then we see that the model implied is Y (^) i = Ziβ + Zibi + ei. Note that we can write this in a more familiar form by letting X (^) i = Zi and ≤i = Zibi + ei. With these identifications, we have Y (^) i = Xiβ + ≤i, i = 1,... , m.

This has exactly the form of the regression models of the previous chapter!

  • The difference is that, here, the way we arrived at this model requires that the error vector ≤i have the particular form above. Note that this implies that, using the independence of bi and ei (and taking var(bi) = D for definiteness),

var(≤i) = ZiDZ′ i + Ri = Σi. (9.9)

Thus, the model implied by thinking in two stages implies that the covariance matrix of a data vector is the sum of two pieces representing the separate effects of among-and within-unit vari- ation.

  • If there is more than one group, the same interpretation holds. Suppose β is (p × 1); p = 4 in the dental example. With βi (k × 1), then Ai a (k × p) matrix; k = 2 in the dental example. Then we see that the model implied is

Y (^) i = Xiβ + Zibi + ei = Xiβ + ≤i,

where Xi = ZiAi. As above, var(≤i) is as in (9.9). In the dental example, note that for boys

Xi = ZiAi =

    

1 ti 1 1 ti 2 ... 1 tini

    

  0 0 1 0 0 0 0 1

  =

   

0 0 1 ti 1 ... ... ... ... 0 0 1 tini

   

and similarly for girls,

Xi = ZiAi =

   

1 ti 1 0 0 ... ... ... ... 1 tini 0 0

   

Compare these with (8.9); they are the same.

RESULT: By thinking about individual trajectories, we see that we ultimately arrive at a regression model that is of the same form as those in the last chapter.

  • The similarity is that the mean of a data vector is of the same linear form; i.e.

E(Y (^) i) = Xiβ,

where the form of the matrices Xi is dictated by the thinking above (Xi = ZiAi).

TERMINOLOGY: These models are known as random coefficient models because they rely on think- ing of individual-specific regression parameters, or coefficients of time, as being random, each representing a draw from a population.

  • The above reasoning is extended easily to the case where units come from more than two groups; for example, for the dialyzer data, where the relationship between transmembrane pressure (“time”) and ultrafiltration rate (response) was observed on dialyzers from 3 centers. We would thus think of each dialyzer having its own straight line relationship, with its own intercept and slope (k = 2). The vector β would represent the mean intercept and slope for each center stacked together, so would have p = 6 elements.
  • The reasoning is extended easily to the case where the “regression model” for an individual unit is something other than a straight line; e.g. suppose a quadratic function is a better model (recall the hip replacement data) Yij = β 0 i + β 1 itij + β 2 it^2 ij + eij. In this case, βi has k = 3 elements.
  • All of these models are a particular case of the more general class of linear mixed effects models we will describe in the next chapter.

9.3 Inference on regression and covariance parameters

Because this way of thinking leads ultimately to the model given in (9.10), the methods of maximum likelihood and restricted maximum likelihood may be used to estimate the parameters that char- acterize “mean” and “variation,” namely β, the distinct elements of D, and the parameters that make up Ri. That is, the methods described in sections 8.5 and 8.6 may be used exactly as described. The same considerations apply:

  • The generalized least squares estimator for β and its large sample approximate sampling distribution will have the same form, with Xi and Σi defined as in (9.10).
  • Questions of interest may be written in the identical fashion, and estimation of approximate standard errors, Wald tests, likelihood ratio tests for nested models, and so on may be carried out in the same way. We will discuss the formulation and interpretation of questions of interest under this model momentarily.
  • Information criteria may be used to compare non-nested models.

See these sections for descriptions, which go through unchanged for the model (9.10).

QUESTIONS OF INTEREST: Because of the way we motivated the random coefficient model, questions of interest may be thought of in different ways. For definiteness, again consider the situation of the dental study data. A vague statement of the main question of interest is: “Is the rate of change of distance as children age different for boys and girls?”

Both here and in the previous chapter, we end up with a model that says that the mean of all possible Yij values we might see at a particular age tij for girls is

E(Yij ) = β 0 ,G + β 1 ,Gtij ,

and similarly for boys. How we arrive at the model involved different thinking, however.

  • In the previous chapters, we always thought in terms of how the means at each time were related, averaged across all units at each time point. In this way of thinking, we write down the model above immediately, and β 1 ,G and β 1 ,B have the interpretation as the parameters that describe the relationship of the mean responses over time; that is, the slope of the (assumed straight line) relationship among means at different times tij.
  • From the motivation for the random coefficient model, we think in terms of individual trajectories and their “typical” features. In this way of thinking, β 1 ,G and β 1 ,B have the interpretation as the means of the populations of child-specific slopes for all possible girls and boys, respectively.

Since the model we end up with is the same, either interpretation is valid. The result is that we may think of the vague question of interest more formally in two ways, and both are correct. If we consider testing H 0 : β 1 ,G − β 1 ,B = 0 vs. H 1 : β 1 ,G − β 1 ,B 6 = 0,

we may interpret this as saying either of the following:

  1. Does the rate of change in mean response over time differ between girls and boys?
  2. Is the “typical” value of the slope of the individual straight lines for girls different from the “typical” value of the slope of the individual straight lines for boys?
  • Such an approach represents an alternative to fitting the full model by ML or REML as discussed above, and is often called a two-stage estimation method. This is because fitting happens in two stages.
  • (1) Estimate each βi separately from the data on unit i only; e.g. if we believe Ri = σ^2 Ini for each i, then we might estimate βi by usual least squares applied to the data from unit i. Call these estimates β̂ i.
  • (2) This distills the data Y (^) i on each individual down to new “data” β̂ i. This suggests using the new “data” as the basis for inference. For example, a natural approach would be to average the β̂ i across all i to estimate β; e.g. if there is only one group, estimate β as

m−^1 ∑^ m i=

β̂ i.

If there are several groups, do this on a group by group basis, e.g. average the estimates from boys and girls separately.

  • To compare groups, compare these sample averages of estimates across groups by using standard statistical methods, e.g. apply an analysis of variance to the slope estimates to compare the mean slope.

This sounds appealing, but it isn’t quite right.

  • The new “data,” the individual estimates β̂ i, are not exactly the “data” we’d like. The ideal for learning about β would be to average the true βi across units. Of course, we don’t know these and the best we can do is estimate them by β̂ i. But this introduces additional uncertainty that the above procedure does not take into account.
  • For example, if the ni are very different across units, with some units having lots of measurements and others only a few, then for some i, β̂ i will be a better estimate of the true βi than for others. Treating them all on equal footing as “data” is thus obviously not appropriate.
  • Thus, simply averaging the β̂ i as if they were the true βi can be misleading.

It turns out that if one wants to use individual estimates as “data,” one must instead take a weighted average of the β̂ i in an appropriate way to take these issues into account. This kind of approach is discussed in Davidian and Giltinan (1995).

Historically, the use of two-stage methods was suggested quite a long time ago, in part because it made intuitive sense. A fundamental paper advocating two-stage methods is Rowell and Walters (1976). Other references to two-stage methods include Gumpertz and Pantula (1989) and Davidian and Gilti- nan (1995). Because the methods of ML and REML are straightforward to implement with available software, we do not consider two-stage methods further here.

SPECIAL CASE – BALANCED DATA: Recall in the last chapter we noted an interesting curiosity for the dental data, which are balanced. When we assumed that the covariance matrix of a data vector, Σi (which is actually the same for all i with balanced data) had the compound symmetry structure, we saw that the generalized least squares estimator for β reduced to the ordinary least squares estimator β̂ OLS treating all data as if they were independent. That is, the GLS estimator

β̂ =

( (^) ∑m i=

X′ i Σ̂ −^1 Xi

)− (^1) ∑m i=

X′ i Σ̂ −^1 Y (^) i (9.11)

with Σ having the compound symmetry structure had the same value as the OLS estimator

β̂ OLS =

( (^) ∑m i=

X′ iXi

)− (^1) ∑m i=

X′ iY (^) i.

It turns out that this is a special instance of a more general result. The general result says:

  • For the random coefficient model, if (i) the data are balanced, with all units seen at the same n times, so that the design matrix Zi of time points is the same for all units i, and (ii) Ri = σ^2 In, then then the generalized least squares estimator is numerically equivalent to the OLS estimator!
  • To show this is a nasty but not impossible exercise in matrix algebra. Under conditions (i) and (ii), Σi reduces to the same matrix for each i:

Σi = ZDZ′^ + σ^2 In.

Substitute this expression for Σ̂ in (9.11) for each i (even if D and σ^2 are replaced by estimates, the form is the same). Fancy footwork with matrix inversion formulæ like those in Chapter 2 may then be used to show the equivalence. Those with strong stomachs might want to try it!