General Linear Models for Longitudinal Data - Lecture Notes | ST 732, Study notes of Statistics

Ch 8 Material Type: Notes; Professor: Davidian; Class: Applied Longitudianal Data Analysis; Subject: Statistics; University: North Carolina State University; Term: Unknown 1989;

Typology: Study notes

Pre 2010

Uploaded on 03/18/2009

koofers-user-6nx-1
koofers-user-6nx-1 🇺🇸

3.7

(3)

10 documents

1 / 101

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
CHAPTER 8 ST 732, M. DAVIDIAN
8 General linear models for longitudinal data
8.1 Introduction
We have seen that the classical methods of univariate and multivariate repeated measures analysis
of variance may be thought of as being based on a statistical model for a data vector from the ith
individual, i= 1,...,m. So far, we have written this model in different ways. Following convention, we
wrote the model as
Y0
i=a0
iM+²0
i,
where Mis the (q×n) matrix
M=
µ11 ··· µ1n
.
.
..
.
..
.
.
µq1··· µqn
,
and the individual means µ`j are for the `th group at the jth time.
We could equally well write this model as
Yi=µ`+²i
for unit icoming from the `th population, `= 1,...,q. Regardless of how we write the model, we note
that it represents Yias having two components:
asystematic component, which describes the mean response over time (depending on group
membership). The individual elements of µ`,µ`j for the `th group at the jth time, are further
represented in terms of an overall mean and deviations as
µ`j =µ+τ`+γj+ (τγ)`j
along with constraints Pq
`=1 τ`= 0, etc in order to give a unique representation.
As noted in the last chapter, this representation
(i) Requires that the length of each data vector Yibe the same,n.
(ii) Does not explicitly incorporate the actual times of measurement or other information.
PAGE 208
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30
pf31
pf32
pf33
pf34
pf35
pf36
pf37
pf38
pf39
pf3a
pf3b
pf3c
pf3d
pf3e
pf3f
pf40
pf41
pf42
pf43
pf44
pf45
pf46
pf47
pf48
pf49
pf4a
pf4b
pf4c
pf4d
pf4e
pf4f
pf50
pf51
pf52
pf53
pf54
pf55
pf56
pf57
pf58
pf59
pf5a
pf5b
pf5c
pf5d
pf5e
pf5f
pf60
pf61
pf62
pf63
pf64

Partial preview of the text

Download General Linear Models for Longitudinal Data - Lecture Notes | ST 732 and more Study notes Statistics in PDF only on Docsity!

8 General linear models for longitudinal data

8.1 Introduction

We have seen that the classical methods of univariate and multivariate repeated measures analysis of variance may be thought of as being based on a statistical model for a data vector from the ith individual, i = 1,... , m. So far, we have written this model in different ways. Following convention, we wrote the model as Y ′ i = a′ iM + ≤′ i,

where M is the (q × n) matrix

M =

   

μ 11 · · · μ 1 n ... ... ... μq 1 · · · μqn

    ,

and the individual means μj are for theth group at the jth time.

We could equally well write this model as

Y (^) i = μ` + ≤i

for unit i coming from the th population, = 1,... , q. Regardless of how we write the model, we note that it represents Y (^) i as having two components:

  • a systematic component, which describes the mean response over time (depending on group membership). The individual elements of μ, μj for the `th group at the jth time, are further represented in terms of an overall mean and deviations as

μj = μ + τ + γj + (τ γ)`j

along with constraints ∑q=1 τ = 0, etc in order to give a unique representation. As noted in the last chapter, this representation (i) Requires that the length of each data vector Y (^) i be the same, n. (ii) Does not explicitly incorporate the actual times of measurement or other information.

  • an overall random deviation ≤i which describes how observations within a data vector vary about the mean and covary among each other. Both univariate and multivariate ANOVA models assume that var(≤i) = Σ is the same (n × n) matrix for all data vectors. Furthermore, (i) Σ is assumed to have the compound symmetry structure in the univariate model. This came from the assumption that each element of ≤i is actually the sum of two random terms, i.e. ≤ij = bi + eij , where the random effect bi has to do with variation among units and eij has to do with variation within units. (ii) Σ is assumed to have no particular structure in the multivariate model.

We also noted in Chapter 5 that this model could be written in an alternative way. Specifically, we defined β as the column vector containing all of μ, τ, γj , (τ γ)j stacked and Xi to be a matrix of 0’s and 1’s with n rows that “picks” off the appropriate elements of β for each element of Y (^) i. We wrote the model in the alternative form Y (^) i = Xiβ + ≤i, (8.1)

where again ≤i is the “overall deviation” vector with var(≤i) = Σ. Note that both the univariate and multivariate ANOVA models could be written in this way; what would distinguish them would again be the assumption on Σ. This model, along with the usual constraints, has the flavor of a “regression” model for the ith unit.

Regardless of how we write the model, it says that, for a unit in group `,

Yij = μ + τ+ γj + (τ γ)j + ≤ij , (8.2)

so that E(Yij ) is taken to have this specific form.

As we will now discuss, a representation like (8.1) offers a convenient framework for thinking about more general model for longitudinal data. In this chapter, we will discuss such a model, writing it in the form (8.1). We will see that we will be able to address several of the issues raised in the last chapter:

  • Alternative definitions of Xi and β will allow for unbalanced data and explicit incorporation of time and other covariates

The sample means suggest that the true means μj at each time point may very well fall on a straight line.

This observation suggests that we may be able to refine our view about the means. Rather than thinking of the mean vector as simply as set of n unrelated means μj , we might think of these means as satisfying μj = β 0 + β 1 tj ;

that is, the means fall on the line with intercept β 0 and slope β 1.

This suggests replacing (8.2) by Yij = β 0 + β 1 tj + ≤ij. (8.3)

Model (8.3) says that, at the jth time tj , Yij values we might see have mean β 0 + β 1 tj and vary about it according to the overall deviations ≤ij.

  • In contrast to (8.2), this model represents the mean as explicitly depending on the time of measurement tj. (With just one group, and hence τ would be the same for all units in that model, and the mean depends on time through γj and (τ γ)`j .)
  • Instead of requiring n=4 separate parameters μj , j = 1,... , n to describe the means at each time, (8.3) requires only two (the intercept and slope). Thus,if we are willing to believe that the true means do indeed fall on a straight line, (8.3) is a more parsimonious representation of the systematic component.
  • Under the new model (8.3), we are automatically including the belief that the trajectory of means should be a straight line. Our best guess (estimate) for this trajectory would be, intuitively, found by estimating the intercept and slope β 0 and β 1 (coming up).
  • An additional possible advantage would be as follows. If we wanted to use these data to learn about, for example, mean distance at age 11 years, the straight line provides us with a natural estimate, while it is not clear what to do with the sample means to get such an estimate (connect the dots?). How would we assess the quality of such an estimate (e.g. provide a standard error)?

To summarize, if we really believe that the mean trajectory follows a straight line, model (8.3) seems more appropriate, because it exploits this assumption.

MATRIX REPRESENTATION: The model (8.3) may be written in matrix form. With Y (^) i as usual the (n × 1) data vector, defining

X =

    

1 t 1 1 t 2 ... ... 1 tn

    

, β =

  β^0 β 1

  ,

we can write the model as Y (^) i = Xβ + ≤i. (8.4)

This has the form of model (8.1). Because all units are seen at the same n times, the matrix X is the same for all units.

COVARIANCE MATRIX: The above development offers an alternative way to represent mean response. To complete the model, we need to also make an assumption about the covariance matrix of the random vector ≤i. For example, as in the classical models, we could assume that this matrix is the same for all data vectors, i.e. var(≤i) = Σ,

for some matrix Σ. Momentarily, we will address the issue of specification of Σ more carefully; for now, as we consider the situation of only a single population, it is natural to take this matrix to be the same for all units.

MULTIVARIATE NORMALITY: Suppose we further assume that the responses Yij are normally dis- tributed at each time point, so that the Y (^) i are multivariate normal. Thus, we may summarize the model as Y (^) i ∼ Nn(Xβ, Σ),

where X and β are as above.

8.3 General case – several groups, unbalanced data, covariates

The modeling strategy for the mean above may be generalized. We consider several possibilities:

  • units from more than one group
  • different numbers/times of observations for each unit
  • other covariates

It is straightforward to see that this is a slick way of noting that if i is a girl or boy, respectively, we are defining

Xi =

   

1 t 1 0 0 ... ... ... ... 1 tn 0 0

    ,^ Xi^ =

   

0 0 1 t 1 ... ... ... ... 0 0 1 tn

    ,

respectively.

With these definitions, it is a simple matrix exercise to verify that Xiβ yields the (n × 1) vector whose elements are β 0 ,G + β 1 ,Gtj or β 0 ,B + β 1 ,B tj , depending on whether i is a boy or girl. We may thus write the model succinctly as Y (^) i = Xiβ + ≤i,

where β and Xi are defined in (8.8) and (8.9), respectively.

  • Note that the matrix Xi is different depending group membership.
  • Note that Xi is not of full rank (a boy does not have information about the mean for girls, and vice versa).
  • Note that β contains all parameters describing the mean trajectory for both groups.

MULTIVARIATE NORMALITY: With the additional assumption of normality, each Y (^) i under this model is n-variate normal with mean Xiβ, where Xi depends on group membership. With some additional assumption about the covariance matrix, e.g. var(≤i) = Σ for all i, we have

Y (^) i ∼ Nn(Xiβ, Σ).

IMBALANCE: It is possible to be even more general. For definiteness, we consider two examples.

ULTRAFILTRATION DATA FOR LOW FLUX DIALYZERS: These data are given in Vonesh and Chinchilli (1997, section 6.6). Low flux dialyzers are used to treat patients with end stage renal disease to remove excess fluid and waste from their blood. In low flux hemodialysis, the ultrafiltration rate (ml/hr) at which fluid is removed is thought to follow a straight line relationship with the transmembrane pressure (mmHg) applied across the dialyzer membrane. A study was conducted to compare the average ultrafiltration rate (the response) of such dialyzers across three dialysis centers where they are used on patients. A total of m = 41 dialyzers (units) were involved. The experiment involved recording the ultrafiltration rate at several transmembrane pressures for each dialyzer.

Figure 2 shows individual dialyzer profiles for the dialyzers in each center. A notable feature of the figure is that the 4 pressures (“time” here) at which each dialyzer was observed are not necessarily the same. Thus, the ith dialyzer has its own set of times tij , j = 1,... , n = 4. Hence, we cannot calculate sample means, because each dialyzer is seen at potentially different pressures. However, if we envision taking means in each panel of the figure across all time points, it seems reasonable that the means would very likely fall approximately on a straight line.

Figure 2: Dialyzer profiles (ultrafiltration rate vs. transmembrane pressure) for 41 dialyzers in 3 centers

tranmembrane pressure (mmHg)

ultrafiltration rate (ml/hr) 100 200 300 400 500

500

1000

1500

2000

Center 1

tranmembrane pressure (mmHg)

ultrafiltration rate (ml/hr) 100 200 300 400 500

500

1000

1500

2000

  • •^

     - - 

Center 2

tranmembrane pressure (mmHg)

ultrafiltration rate (ml/hr) 100 200 300 400 500

500

1000

1500

2000

Center 3

PSfrag replacements

μ σ^21 σ^22 ρ 12 = 0. 0 ρ 12 = 0. 8

y 1 y 2

With the modeling strategy we have adopted, this does not really pose any additional difficulty. From the figure, a reasonable model for the ith dialyzer is

Yij = β 1 + β 2 tij + ≤ij , dialyzer i in center 1 Yij = β 3 + β 4 tij + ≤ij , dialyzer i in center 2 Yij = β 5 + β 6 tij + ≤ij , dialyzer i in center 3 (8.10)

Here, β 1 , β 3 , β 5 are the intercepts and β 2 , β 4 , β 6 are the slopes for the means (straight lines) for each center.

Figure 3: Hæmatocrit trajectories for hip replacement patients. The left hand panels are individual profiles by gender; the right hand panels show a fitted quadratic model for the mean superimposed.

weeks

haematocrit

0.0 0.5 1.0 1.5 2.0 2.5 3.

20

30

40

(^50) •

  • • •
  • (^) •
  • (^) •
  • • (^) • •
  • •^ •
  • • • •
  • • •

Males, individual trajectories

weeks

haematocrit

0.0 0.5 1.0 1.5 2.0 2.5 3.

20

30

40

(^50) •

  • • •
  • (^) •
  • (^) •
  • • (^) • •
  • •^ •
  • • • •
  • • •

Males, mean at age = 65.52 superimposed

weeks

haematocrit

0.0 0.5 1.0 1.5 2.0 2.5 3.

20

30

40

50

    • (^) •
  •  - 
    • (^) •
  • • •

  • • (^) •
    • (^) • •
    • •^ •
  • • •

    • •^

Females, individual trajectories

weeks

haematocrit

0.0 0.5 1.0 1.5 2.0 2.5 3.

20

30

40

50

    • (^) •
  •  - 
    • (^) •
  • • •

  • • (^) •
    • (^) • •
    • •^ •
  • • •

    • •^

Females, mean at age 66.07 superimposed

PSfrag replacements

μ σ^21 σ^22 ρ 12 = 0. 0 ρ 12 = 0. 8

y 1 y 2

It may be seen from the figure that a number of both male and female patients are missing the mea- surement at week 2; in fact, there is one female missing the pre-replacement measurement and week 2. The reason for this is not given by Crowder and Hand; however, because it is so systematic, happening only at this occasion and for about half of the male and half of the female patients, it suggests that the reason has nothing to do with the patients’ health or recovery from the replacement. Perhaps the centrifuge used to obtain hæmatocrit values went on the blink that week before all patients’ values could be obtained! We will assume that the reason for these missing observations has nothing to do with the thing of primary interest, gender; this seems reasonable in light of the pattern of missingness for week 2.

Thus, we have a situation where the data vectors Y (^) i are of possibly different lengths for different units. In particular, we now have that Y (^) i is (ni × 1), where ni is the number of observations on unit i. Thus, the total number of observations from all units is

N = ∑^ m i=

ni.

To determine an appropriate parsimonious representation for the mean of a data vector for each group, we could calculate the sample means at each time point for males and females. We must be a bit careful, however; because of the missingness, the sample means at different times will be of different quality.

Nonetheless, it seems clear from the figure that a model that says the means fall on a straight line for either gender would be inappropriate. For almost all patients, the pre-replacement reading is high; then, following replacement, the hæmatocrit goes down and then slowly rebounds over the next 3 weeks. This suggests that the relationship of the means with time might look more like a quadratic function of time. These observations suggest the following model:

Yij = β 1 + β 2 tij + β 3 t^2 ij + ≤ij , males Yij = β 4 + β 5 tij + β 6 t^2 ij + ≤ij , females. (8.11)

In (8.11), we have allowed for the possibility that the times for each i are not the same, writing tij. For this data set, the times that are potentially available for each individual are the same; however, as we saw in the dialyzer example above, this need not be the case.

To write the model in matrix form, define

β = (β 1 ,... , β 6 )′.

Clearly, the matrix Xi for a given unit will depend on the times of observation for that unit and will have number of rows ni, each row corresponding to one of the ni elements of Yij. For example, for a male with ni observations, we have

Xi =

   

1 ti 1 t^2 i 1 0 0 0 ... ... ... ... ... ... 1 tini t^2 ini 0 0 0

   

We may thus summarize the model as

Y (^) i = Xiβ + ≤i, (ni × 1),

where Xi is the (ni × 6) matrix defined appropriately for individual i.

HIP REPLACEMENT, CONTINUED: In the hip replacement study, the age of each participant was also recorded, and in fact an objective of the investigators was not only to understand differences in hæmatocrit response across genders but also to elucidate whether the age of the patient has an effect on response. It turns out that the sample mean age for males was 65.52 years and that for females was 66.07 years. From Figure 3, the patterns look pretty similar for both genders; of course, there is no easy way of discerning from the plot whether age affects the response.

To illustrate inclusion of the age covariate, consider the following modified model, where ai is the age of the ith patient:

Yij = β 1 + β 2 tij + β 3 t^2 ij + β 7 ai + ≤ij , males Yij = β 4 + β 5 tij + β 6 t^2 ij + β 7 ai + ≤ij , females. (8.12)

Model (8.12) says that, regardless of whether a person is male or female, the mean hæmatocrit response at any time increases by β 7 for every year increase in age (keep in mind that β 7 could be negative). One can envision fancier models where this also depends on gender. It is straightforward to write this in matrix notation as before; with β = (β 1 ,... , β 7 )′,

we can define appropriate Xi matrices, i.e. for a male of age ai

Xi =

   

1 ti 1 t^2 i 1 0 0 0 ai ... ... ... ... ... ... 1 tini t^2 ini 0 0 0 ai

   

PARAMETERIZATION: It is possible to represent models like those above in different ways. For definiteness, consider the dialyzer example. We wrote the model in (8.10) as

Yij = β 1 + β 2 tij + ≤ij , dialyzer i in center 1 Yij = β 3 + β 4 tij + ≤ij , dialyzer i in center 2 Yij = β 5 + β 6 tij + ≤ij , dialyzer i in center 3

It is sometimes more convenient, although entirely equivalent, to write the model in an alternative parameterization. As we have discussed, a question of interest is often to compare the rate of change of the mean response over time (pressure here) among groups. In this situation, we would like to compare the three slopes β 2 , β 4 , and β 6.

Define δi 1 = 1 unit i from center 1; = 0 o.w. δi 2 = 1 unit i from center 2; = 0 o.w.

Then write the model as

Yij = β 1 + β 2 δi 1 + β 3 δi 2 + β 4 tij + β 5 δi 1 tij + β 6 δi 2 tij + ≤ij (8.13)

There are still 6 parameters overall, but the ones in (8.13) have an entirely different interpretation from those in the first model.

It is straightforward to observe by simply plugging in the values of δi 1 and δi 2 for each center that the following is true:

Center Intercept Slope 1 β 1 + β 2 β 4 + β 5 2 β 1 + β 3 β 4 + β 6 3 β 1 β 4

Note that β 2 and β 3 have the interpretation of the difference in intercept between Centers 1 and 3 and Centers 2 and 3, respectively, and β 1 is the intercept for Center 3. Similarly, β 5 and β 6 have the interpretation of the difference in slope between Centers 1 and 3 and Centers 2 and 3, respectively, and β 1 is the slope for Center 3. This parameterization allows us to estimate, as we will talk about shortly, the differences of interest directly. This same type of parameterization is used in ordinary linear regression for similar reasons.

This type of parameterization is the default used by SAS PROC GLM and PROC MIXED, which we will use to implement the analyses we will discuss shortly. The different parameterizations yield equivalent models; the only thing that differs is the interpretation of the parameters.

ALTERNATIVE MODELS: We now recall the other models. Actually, there is nothing stopping us from allowing var(≤i) to be different for different groups; e.g., in the dental study, allow different covariance matrices for each gender. We discuss this further below.

  • One-dependent. Recall that it seems reasonable that observations taken more closely together in time might tend to be “more alike” than those taken farther apart. If the observation times are spaced so that the time between 2 nonconsecutive observations is fairly long, we might conjecture that correlation is likely to be the largest among observations that are adjacent in time; that is, occur at consecutive times. Relative to the magnitude of this correlation, the correlation between observations separated by two time intervals might for all practical purposes be negligible. An example of a one-dependent model embodying this assumption is

Σ = var(≤i) =

    

σ^2 ρσ^2 0 · · · 0 ρσ^2 σ^2 ρσ^2 · · · 0 ... ... ... ... ... 0 0 · · · ρσ^2 σ^2

    

This model would make sense even if the times are not equally-spaced in time (as they are, for example, in the dental study: 8, 10, 12, 14). It is possible to extend this to a two-dependent or higher dependent model or to heterogeneous variances over time, as discussed in Chapter 4. SAS PROC MIXED uses the designation type = toep(2) (for “Toeplitz” with 2 diagonal bands) to refer to this assumption with the same variance at all times. With groups, we could believe the one-dependent assumption holds for each group, but allow the possibility that the variance σ^2 and correlation ρ are different in each group. The same holds true for the rest of the models we consider.

  • Autoregressive of order 1 (equally-spaced in time). This model says that correlation drops off as observations get farther apart from each other in time. The following model really only makes sense if the times of observation are equally-spaced. The so-called AR(1) model with homogeneous variance over time is

Σ = var(≤i) = σ^2

    

1 ρ ρ^2 · · · ρn−^1 ρ 1 ρ · · · ρn−^2 ... ... ... ... ... ρn−^1 ρn−^2 · · · ρ 1

    

SAS PROC MIXED uses the designation type = ar(1) to refer to this assumption.

  • Markov (unequally spaced in time). The AR(1) model may be generalized to times that are unequally-spaced. (e.g. 1, 3, 4, 5, 6, 7 as in the guinea pig diet data). The powers of ρ are taken to be the distances in time between the observations. That is, if

djk = |tij − tik|, j, k = 1,... , n,

then the model is

Σ = var(≤i) = σ^2

   

1 ρd^12 · · · ρd^1 n ... ... ... ... ρdn^1 ρdn^2 · · · 1

   

SAS PROC MIXED allows this type of model to be implemented in more than one way, e.g with the type = sp(pow)(.) designation.

We will consider examples of fitting these structures to several of our examples in section 8.8. The SAS PROC MIXED documentation, as well as the books by Diggle, Heagerty, Liang, and Zeger (2002) and Vonesh and Chinchilli (1997), discuss other assumptions.

DECIDING AMONG COVARIANCE STRUCTURES: In the balanced case, one may use the tech- niques discussed in Chapter 4 to gain informal insight into the structure of var(≤i). Inspection of sample covariance matrices, scatterplot matrices, autocorrelation functions, and lag plots can aid the analyst in identifying possible reasonable models.

These methods can be modified to take into account the fact that one believes that the mean vectors follow smooth trajectories over time, such as a straight line. For instance, instead of using the sample means for “centering” in these approaches, one might estimate β somehow; e.g. by least squares treating all the individual responses from all units as if they were independent (even though we know they are probably not). Least squares is clearly not the best way to estimate β (recall our discussion in Chapter 3); however, this estimator may be “good enough” to provide reasonable estimates of the means at each time tj that take advantage of our willingness to believe they follow a smooth trajectory, so might be preferred to using sample means at each j on this account. In particular, if

μj = β 0 + β 1 tj ,

say, for a single group, we would estimate μj by β̂ 0 + β̂ 1 tj and use this in place of the sample mean.

A complete discussion of graphical and other techniques along these lines may be found in Diggle, Heagerty, Liang, and Zeger (2002).

  • Under the one-dependent model, which says that only observations adjacent in time are corre- lated, this matrix becomes (convince yourself!)

Σi =

   

σ^2 ρσ^2 ρσ^2 σ^2 0 0 σ^2

   .

  • Under the AR(1) model, this matrix becomes (convince yourself!)

Σi = σ^2

   

1 ρ ρ^3 ρ 1 ρ^2 ρ^3 ρ^2

   

These examples illustrate the main point – if all observations were intended to be taken at the same times, but some are not available, the covariance matrix must be carefully constructed according to the particular time pattern for each unit, using the convention of the assumed covariance model.

Now consider the situation of the ultrafiltration data. Here, the actual times of observation are different for each unit. Consider again the above models.

  • Here, the unstructured assumptions are difficult to justify. Because each unit was seen at a different set of times, they cannot share the same covariance parameters, so the matrix Σi must depend on entirely different quantities for each i.
  • The compound symmetry assumption could still be used, as it does not pay attention to the actual values of the times. Of course, it still suffers from the drawbacks for longitudinal data we have already noted.
  • We might still be willing to adopt something like the one-dependent assumption in the same spirit as with compound symmetry, saying that observations that are adjacent in time, regardless of how far apart they might be, are correlated, but those farther are not. However, it is possible that the distance in time for adjacent observations for one unit might be longer than the distance for nonconsecutive observations for another unit, making this seem pretty nonsensical!
  • The AR(1) assumption is clearly inappropriate by the same type of reasoning.
  • The so-called Markov assumption seems more promising in this situation – the correlation among observations within a unit would depend on the time distances between observations within a unit.

Clearly, with different times for different units, modeling covariance is more challenging! In fact, it is even hard to investigate the issue informally, because the information from each unit is different. In the next two chapters of the course, we will talk about another approach to modeling longitudinal data that obviates the need to think quite so hard about all of this!

INDEPENDENCE ASSUMPTION: An alternative to all of the above, in both cases of balanced and unbalanced data, is the assumption that observations within a unit are uncorrelated, which, with the assumption of multivariate normality implies that they are independent. That is, if we believe that all observations have constant variance var(Yij ) = σ^2 , take

Σi = var(≤i) = σ^2 Ini.

  • This assumption seems incredibly unrealistic for longitudinal data. It says that observations on the same unit are no more alike than those compared across units! In a practical sense, it implies variation among units must be negligible; otherwise, we would expect observations on the same individual to be correlated due to this source.
  • It also says that there is no correlation induced by within-unit fluctuations over time. This might be okay if the observations are all taken sufficiently far apart in time from one another, however, may be unrealistic if they are close in time.
  • Occasionally, this model might be sensible, e.g. suppose the units are genetically-engineered mice, bred specifically to be as alike as possible. Under such conditions, we might expect that the component of variation due to variation among mice might indeed be so small as to be regarded as negligible. If furthermore the observations on a given mouse are all far apart in time, then we would expect no correlation for this reason, either.
  • In most situations, however, this assumption represents an obvious model misspecification, i.e. the model almost certainly does not accurately represent the truth.
  • However, sometimes, this assumption is adopted nonetheless, even though the data analyst is fully aware it is likely to be incorrect. The rationale will be discussed later in the course.

SUMMARY: The important message is that, by thinking about the situation at hand, it is possible to specify models for covariance that represent the main features in terms of a few parameters. Thus, just as we model the systematic component in terms of a regression parameter β, we may model the random component.