Latent Variables Analysis - Lecture Notes | CLASSIC 0153I, Exams of Classical Philology

Material Type: Exam; Class: GRECO-ROMAN ARCHTCT; Subject: Classics; University: University of California - Los Angeles; Term: Unknown 1998;

Typology: Exams

Pre 2010

Uploaded on 08/31/2009

koofers-user-kb8
koofers-user-kb8 🇺🇸

9 documents

1 / 24

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Chapter 19
Latent Variable Analysis
Growth Mixture Modeling and Related
Techniques for Longitudinal Data
Bengt Muth´en
19.1. Introduction
This chapter gives an overview of recent advances in
latent variable analysis. Emphasis is placed on the
strength of modeling obtained by using a flexible com-
bination of continuous and categorical latent variables.
To focus the discussion and make it manageable in
scope, analysis of longitudinal data using growth
models will be considered. Continuous latent variables
are common in growth modeling in the form of random
effects that capture individualvariation in development
over time. The use of categorical latent variables in
growth modeling is, in contrast, perhaps less familiar,
and new techniques have recently emerged. The aim
of this chapter is to show the usefulness of growth
model extensions using categorical latent variables.
The discussion also has implications for latent variable
analysis of cross-sectional data.
The chapter begins with two major parts corre-
sponding to continuous outcomes versus categorical
outcomes. Within each part, conventional modeling
using continuous latent variables will be described
AUTHOR’S NOTE: The research was supported under grant K02 AA 00230 from NIAAA. I thank the Mplus team for software
support, Karen Nylund and Frauke Kreuter for research assistance, and Tihomir Asparouhov for helpful comments. Please send
correspondence to [email protected].
first, followed by recent extensions that add categorical
latent variables. This covers growth mixture model-
ing, latent class growth analysis, and discrete-time
survival analysis. Two additional sections demonstrate
further extensions. Analysis of data with strong floor
effects gives rise to modeling with an outcome that
is part binary and part continuous, and data obtained
by cluster sampling give rise to multilevel modeling.
All models fit into a general latent variable frame-
work implemented in the Mplus program (Muth´
en &
Muth´
en, 1998–2003). For overviews of this model-
ing framework, see Muth´
en (2002) and Muth´
en and
Asparouhov (2003a, 2003b). Technical aspects are
covered in Asparouhov and Muth´
en (2003a, 2003b).
19.2. Continuous Outcomes:
Conventional Growth Modeling
In this section, conventional growth modeling will
be briefly reviewed as a background for the more
general growth modeling to follow. To prepare for this
345
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18

Partial preview of the text

Download Latent Variables Analysis - Lecture Notes | CLASSIC 0153I and more Exams Classical Philology in PDF only on Docsity!

Chapter 19

Latent Variable Analysis

Growth Mixture Modeling and Related

Techniques for Longitudinal Data

Bengt Muth´en

19.1. Introduction

This chapter gives an overview of recent advances in latent variable analysis. Emphasis is placed on the strength of modeling obtained by using a flexible com- bination of continuous and categorical latent variables. To focus the discussion and make it manageable in scope, analysis of longitudinal data using growth models will be considered. Continuous latent variables are common in growth modeling in the form of random effects that capture individual variation in development over time. The use of categorical latent variables in growth modeling is, in contrast, perhaps less familiar, and new techniques have recently emerged. The aim of this chapter is to show the usefulness of growth model extensions using categorical latent variables. The discussion also has implications for latent variable analysis of cross-sectional data. The chapter begins with two major parts corre- sponding to continuous outcomes versus categorical outcomes. Within each part, conventional modeling using continuous latent variables will be described

AUTHOR’S NOTE: The research was supported under grant K02 AA 00230 from NIAAA. I thank the Mplus team for software support, Karen Nylund and Frauke Kreuter for research assistance, and Tihomir Asparouhov for helpful comments. Please send correspondence to [email protected].

first, followed by recent extensions that add categorical latent variables. This covers growth mixture model- ing, latent class growth analysis, and discrete-time survival analysis. Two additional sections demonstrate further extensions. Analysis of data with strong floor effects gives rise to modeling with an outcome that is part binary and part continuous, and data obtained by cluster sampling give rise to multilevel modeling. All models fit into a general latent variable frame- work implemented in the Mplus program (Muth´en & Muth´en, 1998–2003). For overviews of this model- ing framework, see Muth´en (2002) and Muth´en and Asparouhov (2003a, 2003b). Technical aspects are covered in Asparouhov and Muth´en (2003a, 2003b).

19.2. Continuous Outcomes:

Conventional Growth Modeling

In this section, conventional growth modeling will be briefly reviewed as a background for the more general growth modeling to follow. To prepare for this

346 • SECTION V/MODELS FOR LATENT VARIABLES

Figure 19.1 LSAY Math Achievement in Grades 7 to 10

All students

Grades 7− 10

7 8 9 10

40

60

80

100

Students with only HS Expectations in G

Grades 7− 10

7 8 9 10

40

60

80

100

transition, the multilevel and mixed linear modeling representation of conventional growth modeling will be related to representations using structural equation modeling and latent variable modeling. To introduce ideas, consider an example from math- ematics achievement research. The Longitudinal Study of Youth (LSAY) is a national sample of mathematics and science achievement of students in U.S. public schools (Miller, Kimmel, Hoffer, & Nelson, 2000). The sample contains 52 schools with an average of about 60 students per school. Achievement scores were obtained by item response theory equating. There were about 60 items per test with partial item overlap across grades. Tailored testing was used so that test results from a previous year influenced the difficulty level of the test of a subsequent year. The LSAY data used here are from Cohort 2, containing a total of 3,102 students followed from Grade 7 to Grade 12 starting in 1987. Individual math trajectories for Grades 7 through 10 are shown in Figure 19.1. The left-hand side of Figure 19.1 shows typical trajectories from the full sample of students. Approx- imately linear growth over the grades is seen, with the average linear growth shown as a bold line. Conven- tional growth modeling is used to estimate the average growth, the amount of variation across individuals in the growth intercepts and slopes, and the influence of covariates on this variation. The right-hand side of Figure 19.1 uses a subset of students defined by one such covariate, considering students who, in seventh grade, expect to get only a high school degree. It is seen that the intercepts and slopes are considerably lower for this group of low-expectation students. A conventional growth model is formulated as follows for the math achievement development related

to educational expectations. For ease of transition between modeling traditions, the multilevel notation of Raudenbush and Bryk (2002) is chosen. For time point t and individual i, consider the variables

y ti = repeated measures on the outcome (e.g., math achievement), a 1 ti = time-related variable (time scores) (e.g., Grades 7–10), a 2 ti = time-varying covariate (e.g., math course taking), x (^) i = time-invariant covariate (e.g., Grade 7 expectations),

and the two-level growth model,

Level 1: y ti = π 0 i + π 1 i a 1 ti + π 2 ti a 2 ti + e ti , (1)

Level 2:

π 0 i = β 00 + β 01 x (^) i + r 0 i π 1 i = β 10 + β 11 x (^) i + r 1 i. π 2 i = β 20 + β 21 x (^) i + r 2 i

Here, π 0 i , π 1 i , and π 2 i are random intercepts and slopes varying across individuals. The residuals e, r 0 , r 1 , and r 2 are assumed normally distributed with zero means and uncorrelated with a 1 , a 2 , and w. The Level 2 residuals r 0 , r 1 , and r 2 are possibly corre- lated but uncorrelated with e. The variances of e (^) t are typically assumed equal across time and uncor- related across time, but both of these restrictions can be relaxed. 1

(^1) The model may alternatively be expressed as a mixed linear model relating y directly to a 1 , a 2 , and x by inserting (2) into (1). Analogous to a two-level regression, when either a ti or π 2 ti varies across i, there is variance heteroscedasticity for y given covariates and therefore not a single covariance matrix for model testing.

348 • SECTION V/MODELS FOR LATENT VARIABLES

Figure 19.3 LSAY Math Achievement in Grades 7 to 10 and High School Dropout

All Students

Grades 7− 10 (5% Sample)

Math Achievement

7 8 9 10

40

60

80

100

Dropouts

Grades 7− 10 (20% Sample)

7 8 9 10

40

60

80

100

and Asparouhov (2003a), and a technical background is given in Asparouhov and Muth´en (2003a). In addi- tion, general latent variable modeling allows modeling with a combination of continuous and categorical latent variables to more realistically represent longitudinal data. This aspect is the focus of the current chapter.

19.3. Continuous Outcomes:

Growth Mixture Modeling

The model in (1) and (2) has two key features. On one hand, it allows individual differences in develop- ment over time because the growth intercept π 0 i and growth slope π 1 i vary across individuals, resulting in individually varying trajectories for y ti over time. This heterogeneity is captured by random effects (i.e., con- tinuous latent variables). On the other hand, it assumes that all individuals are drawn from a single population with common population parameters. Growth mix- ture modeling relaxes the single population assumption to allow for parameter differences across unobserved subpopulations. This is accomplished using latent tra- jectory classes (i.e., categorical latent variables). This implies that instead of considering individual varia- tion around a single mean growth curve, the growth mixture model allows different classes of individuals to vary around different mean growth curves. The combined use of continuous and categorical latent variables provides a very flexible analysis framework. Growth mixture modeling was introduced in Muth´en and Shedden (1999) with extensions and overviews in Muth´en and Muth´en (1998–2003) and Muth´en (2001a, 2001b, 2002).

Consider again the math achievement example and the math development shown in the right-hand part of Figure 19.3. This is the development for individuals who are later classified as having dropped out by Grade 12. Note that although Figure 19.1 considers an antecedent of development, Grade 7 expectations, Figure 19.3 considers a consequence of development, high school dropout. It is seen that, with a few exceptions, the high school dropouts typically have a lower starting point in Grade 7 and grow slower than the average students in the left-hand part of the figure. This suggests that there might be an unobserved subpopulation of students who, in Grades 7 through 10, show poor math development and who have a high risk for dropout. In educational dropout research, such a subpopulation is often referred to as “disengaged,” where disengagement has many hypothesized predic- tors. The subpopulation membership is not known during Grades 7 through 10 but is revealed when students drop out of high school. The subpopula- tion membership can, however, be inferred from the Grade 7 through 10 math achievement development.

19.3.1. Growth Mixture Model Specification

To introduce growth mixture modeling (GMM), consider a latent categorical variable c (^) i representing the unobserved subpopulation membership for student i, c (^) i = 1 , 2 ,... , K. Here, c will be referred to as a latent class variable or, more specifically, a trajectory class variable. Assume tentatively that in the math achievement example, K = 2, representing a disen- gaged class (c = 1 ) and a normative class (c = 2 ). An example of the different parts of the model is shown

Chapter 19/Latent Variable Analysis • 349

Figure 19.4 GGMM Diagram

y

π 0 π 1

y2 (^) y3 y

c

x xmis

u

in the model diagram in Figure 19.4. The model has covariates x and xmis , a latent class variable c, repeated continuous outcomes y, and a distal dichotomous out- come u. For simplicity, time-varying covariates are not included in this example. The covariate x influences c and has direct effects on the growth factors π 0 and π 1 , as well as a direct effect on u. In this section, the xmis covariate will be assumed to have no role in the model. Its effects will be studied in later sections. Consider first the prediction of the latent class variable by the covariate x using a multinomial logistic regression model for K classes,

P (c (^) i = k|x (^) i ) =

e γ^0 k^ +γ^1 k^ x^ i ∑K s= 1 e^ γ 0 s +γ 1 s x (^) i,^ (3)

with the standardization γ 0 K = 0, γ 1 K = 0. With a binary c(c = 1 , 2 ), this gives

P (ci = 1 |x (^) i ) =

1 + e−l^ i

where l is the logit (i.e., the log odds),

log[P (ci = 1 |x (^) i )/P (c (^) i = 2 |x (^) i )] = γ 01 + γ 11 x (^) i , (5)

so that γ 11 is the increase in the log odds of being in the disengaged versus the normative class for a unit increase in x. For example, assume that x is dichoto- mous and scored 0, 1 for females versus males. From (4), it follows that e γ^11 is the odds ratio for being in the disengaged class versus the normative class when comparing males to females. For example, γ 11 = 1 implies that the odds of being in the disengaged class

versus the normative class is e^1 = 2 .72 times higher for males than females. Generalizing (1) and (2), GMM considers a sepa- rate growth model for each of the two latent classes. Key differences across classes is typically found in the fixed effects β 00 , β 10 , and β 20 in (2). For example, the disengaged class would have lower β 00 and β 10 values (i.e., lower means) than the normative class. Class dif- ferences may also be found in the covariate influence, with class-varying β 01 , β 11 , and β 21. In addition, class- varying variances and covariances for the r residuals may be found. In (1), the type of growth function for Level 1 is perhaps different across class as well. For example, although the disengaged class may be well represented by linear growth, the normative class may show accelerated growth over some of the grades (e.g., calling for a quadratic growth curve). Here, the variance for the e residual may also be class varying. The basic GMM can be extended in many ways. One important extension is to include an outcome that is predicted from the growth. Such an outcome is often referred to as a distal outcome , whereas in this context, the growth outcomes are referred to as proximal out- comes. Dropping out of high school is an example of such a distal outcome in the math achievement context. Given that the growth is succinctly summarized by the latent trajectory class variable, it is natural to let the latent trajectory class variable predict the distal outcome. With the example of a dichotomous distal outcome u scored 0, 1, this model part is given as a logistic regression with covariates c and x,

P (ui = 1 |c (^) i = k, xi ) =

1 + e τ^ k^ −κ^ k^ x^ i

where the main effect of c is captured by the class- varying thresholds τ (^) k (an intercept with its sign reversed), and κ (^) k is a class-varying slope for x. For each class, the same odds ratio interpretation given above can be applied also here. Model extensions of this type will be referred to as general growth mixture modeling (GGMM).

19.3.1.1. Latent Class Growth Analysis

A special type of growth mixture model has been studied by Nagin and colleagues (see, e.g., Nagin, 1999; Nagin & Land, 1993; Roeder, Lynch, & Nagin,

  1. using the SAS procedure PROC TRAJ (Jones, Nagin, & Roeder, 2001). See also the 2001 special issue of Sociological Methods & Research (Land, 2001). The models studied by Nagin are character- ized by having zero variances and covariances for r in (2); that is, individuals within a class are treated as

Chapter 19/Latent Variable Analysis • 351

Figure 19.5 Random-Effects Distributions Represented by Mixtures

Nodes

Weight Weight

Nodes

membership in each class can be estimated, as well as the individual’s score on the growth factors π 0 i and π 1 i. Measures of classification quality can be consid- ered based on the individual class probabilities, such as entropy. This has been implemented in the Mplus program (Muth´en & Muth´en, 1998–2003). Technical aspects of the modeling, estimation, and testing are given in Technical Appendix 8 of the Mplus User’s Guide (Muth´en & Muth´en, 1998–2003), Muth´en and Shedden (1999), and Asparouhov and Muth´en (2003a, 2003b). Missing data on y are handled using MAR. Muth´en, Jo, and Brown (2003) discuss nonignorable missing data modeling using missing data indicators. As with mixture modeling in general, local optima are often encountered in the likelihood. This phenomenon is well known, for example, in latent class analysis, particularly in models with many classes and data that carry limited information about the class membership. Because of this, the use of several different sets of starting values is recommended, and this is automated in Mplus.

19.3.1.4. The LSAY Example

To conclude this section in a concrete way using the LSAY math achievement data, a brief preview of the analyses in Section 3.5 is of interest. Figure 19.6 shows that three latent trajectory classes are found, includ- ing their class probabilities, the mean trajectory and individual variation for each class, and the probability of dropping out of high school for each class. Of the students, 20% are found to belong to a disengaged class with poor math development. Membership in the disengaged class dramatically enhances the risk of dropping out of high school, raising the dropout percentage from 1% and 8% to 69%. Section 3.

presents the covariates predicting latent trajectory class membership, and it is found that having low educa- tional expectations and dropout thoughts already by Grade 7 are key predictors. Before going through the analysis steps for the LSAY math achievement example, model interpreta- tion, estimation, and model selection procedures will be discussed. Latent variable modeling requires good analysis strategies, and this is even more true in the framework of growth mixture modeling, where both continuous and categorical latent variables are used. Many statistical procedures have been suggested within the related statistical area of finite mixture modeling (see, e.g., McLachlan & Peel, 2000), and some key ideas and new extensions will be briefly reviewed. Both substantive and statistical consider- ations are critical and will be discussed. Early pre- diction of class membership is also of interest in growth mixture modeling and will be briefly cov- ered. In the LSAY math achievement example, it is clearly of interest to make such early predictions of risk for high school dropout to make interventions possible.

19.3.2. Substantive Theory and

Auxiliary Information for Predicting

and Understanding Model Results

GGMM should be investigated using substantively based theory and evidence. Auxiliary information can be used to more fully understand model results even at an exploratory stage, when little theory exists. Once substantive theory has been formulated, it can be used to predict a related set of events that can then be tested.

352 • SECTION V/MODELS FOR LATENT VARIABLES

Figure 19.6 LSAY Math Achievement in Grades 7 to 10 and High School Dropout

Math Achievement

7 8 9 10

40

60

80

100

40

60

80

100

40

60

80

100

Poor Development: 20%

Grades 7− 10

7 8 9 10 Grades 7− 10

7 8 9 10 Grades 7− 10

Moderate Development: 28% Good Development: 52%

Dropout: 69% 8% 1%

Substantive theory building typically does not rely on only a single outcome measured repeatedly, accumulating evidence for a theory only by sorting into classes observed trajectories on a single outcome variable. Instead, many different sources of auxiliary information are used to check the theory’s plausibility. Mental health research may find that a pattern of a high level of deviant behavior at ages when this is not typical is often accompanied with a variety of neg- ative social consequences, so that there is a distinct subtype. A good education study of failure in school also considers what else is happening in the student’s life, involving predictions of accompanying problems. Gene-environment interaction theories may predict the emergence of problems as a response to adverse life events at certain ages. These are the situations when GGMM is particularly useful. GGMM can include the auxiliary information in the model and test if the classes formed have the characteristics on the auxiliary variables that are predicted by theory. Auxiliary infor- mation may take the form of antecedents, concurrent events, or consequences. These are briefly discussed in turn below.

19.3.2.1. Antecedents

Auxiliary information in the form of antecedents (covariates) of class membership and growth factors should be included in the set of covariates to cor- rectly specify the model, find the proper number of classes, and correctly estimate class proportions and class membership. The fact that the “uncondi- tional model” without covariates is not necessarily

the most suitable for finding the number of classes has not been fully appreciated and will be discussed below. An important part of GGMM is the prediction of class membership probabilities from covariates. This gives the profiles of the individuals in the classes. The estimated prediction of class membership is a key feature in examining predictions of theory. If classes are not statistically different with respect to covariates that, according to theory, should distinguish classes, crucial support for the model is absent. Class variation in the influence of antecedents (covariates) on growth factors or outcomes also pro- vides a better understanding of the data. As a caveat, one should note that if a single-class model has gen- erated the data with significant positive influence of covariates on growth factors, GGMM that incorrectly divides up the trajectories in, say, low, medium, and high classes might find that covariates have lower and insignificant influence in the low class due to selection on the dependent variable. If a GGMM has generated the data, however, the selected subpopulation is the relevant one to which to draw the inference. In either case, GGMM provides considerably more flexibility than what can be achieved with conventional growth modeling. As an example, consider Muth´en and Curran’s (1997) analysis of a preventive interven- tion with a strong treatment-baseline interaction. The intervention aimed at changing the trajectory slope of aggressive-disruptive behavior of children in Grades 1 through 7. No main effect was found, but Muth´en and Curran used multiple-group latent growth curve modeling to show that the initially more aggressive children benefited from the intervention

354 • SECTION V/MODELS FOR LATENT VARIABLES

designs that are not offered in Mplus, externally generated data can be analyzed using the RUNALL utility. 5 An extensive Monte Carlo study of growth mixture and related factor mixture models is given in Lubke and Muth´en (2003).

19.3.4. Statistical Aspects of Growth Mixture

Modeling: Model Selection Procedures

This section gives an overview of strategies and methods for model selection and testing. An emphasis is placed on practical analysis steps and recent testing developments.

19.3.4.1. Analysis Steps

In conventional growth modeling, a common analysis strategy is to first consider an “unconditional model” (i.e., not introducing covariates for the growth factors). This strategy can lead to confusion with growth mixture modeling. Consider the growth mix- ture model diagram shown earlier in Figure 19.4. Here the model has covariates x and xmis , a latent class variable c, repeated continuous outcomes y, and a distal dichotomous outcome u. The covariate x influ- ences c, has direct effects on the growth factors π 0 and π 1 , and also has direct effects on u. Consider first an analysis of this model without u and without the xs. Here, the class formation is based on information from the observed variables y, chan- neled through the growth factors. A distorted analysis is obtained if the xs are excluded because they have direct effects on the growth factors. This is because the only observed variables, y, are incorrectly related to c if the xs are excluded. The distortion can be under- stood based on the analogy of a misspecified regression analysis. Leaving out an important predictor, the slope for the other predictor is distorted. In Figure 19.4, the other predictor is the latent class variable c, and the distortion of its effect on the growth factors causes incorrect evaluation of the posterior probabilities in the E step and therefore incorrect class probability estimates and incorrect individual classification. If, on the other hand, the x covariates do not have a direct influence on the growth factors (and no direct influ- ence on y), the “unconditional model” without the xs would be correct, giving correct class probabilities and growth curves for y. To further explicate the reasoning above, consider a data set generated by the model in Figure 19.

(^5) See http://www.statmodel.com/runutil.html.

without the xmis covariate, using the Monte Carlo feature of Mplus discussed earlier. 6 Analysis of the generated data by the correct model recovers the popu- lation parameters well, as expected. The estimated Class 1 probability of 0.26 is close to the true value of 0.27. The entropy is not large, despite the cor- rectness of the model, 0.57, but this is a function of the degree of separation between the classes and the within-class variation. In line with the discussion above, the influence of the covariate x is of special interest. The model that generated the data has a posi- tive slope for the influence of x on being in the smaller Class 1, positive slopes for the influence on the growth factors, and a positive slope for the influence on u. The estimated class-specific means and variances of the x covariate are 0.63 and 0.79 for Class 1 and − 0. 20 and 0.82 for Class 2. The higher mean for Class 1 is expected, given the positive slope for the influence on the Class 1 membership. Being in Class 1, in turn, implies higher means for the growth factors. Within class, the growth factor means are higher due to the direct positive influence of x on the growth factors. With x left out of the model, the latent class variable alone needs to account for the differences in growth factor values across individuals. As a result, the class probabilities are misestimated. In the generated data example, the Class 1 probability is now misestimated as 0.35. 7 Analyzing the Figure 19.4 model excluding u but correctly including x gives the correct answer in terms of class membership probabilities for c and growth curves for y. This is because excluding u does not imply that the observed variables (y or x) are incor- rectly related to c. Excluding u simply makes the standard errors larger and worsens the classification precision (entropy). In the generated data example, the Class 1 probability is well estimated as 0.26, whereas the entropy is now lowered to 0.50. 8 In practice, model estimation with and without a distal outcome u may give different results for the class probabilities and growth curves for two reasons. First, if you include u but misspecify the model by not allowing direct effects from the xs to u, you get distorted parameter estimates (e.g., incorrect class probabilities) by the same regression misspecification analogy given above. In the generated data example,

(^6) The Mplus input and output for this analysis is given in Example 2 at www.statmodel.com/mplus/examples/penn.html. (^7) The Mplus input and output for this analysis is given in Example 3 at www.statmodel.com/mplus/examples/penn.html. (^8) The Mplus input and output for this analysis are given in Example 4 at www.statmodel.com/mplus/examples/penn.html.

Chapter 19/Latent Variable Analysis • 355

this misspecification gave the strongly distorted Class 1 probability estimate as 0.40. Second, key covariates may have been left out of the model (i.e., may not have been measured or are missing), causing a model mis- specification. The notation xmis in Figure 19.4 refers to such a covariate. Consider two cases, both assuming that xmis is not available. First, if xmis influences only u and not the growth factors, the analysis excluding u gives correct results, but the analysis including u gives incorrect and hence different results. Second, if xmis influences both the growth factors and u, the analyses with and without u give incorrect results and are different. In conclusion, the proper choice of covariates is important in growth mixture modeling. Substantive theory and previous analyses are needed to make a choice that is sufficiently inclusive. The covariates should be allowed to influence not only class member- ship but also the growth factors directly, unless there are well-motivated reasons not to. An analysis without covariates can be useful to study different growth in different trajectory classes. However, it should not be expected that the class distribution or individual classification remains the same when adding covari- ates. It is the model with covariates properly included that gives the better answer. It should also be noted that choosing the correct within-class variance structure is important. The data above were generated from a model with class-varying variances for the residuals of e in (1). Misspecifying the model by holding these variances equal across class leads to an estimated Class 1 probability of 0.23. Larger distortions would be obtained if the growth factor variances differ across classes. It is instructive to consider model misspecification results if data generated by the growth mixture model are analyzed by a latent class growth analysis. In the generated data example above, LCGA leads to a mis- specified model. The misspecification can be studied in two steps, first by restricting the residual (co)variances and second by also not allowing the direct influence from x to the growth factors. In both cases, the distal outcome is u. In the first step, the estimated Class 1 probability is found to be 0.42, a value far off from the true probability of 0.27. In the second step, the estimated Class 1 probability is even more strongly distorted, 0.51. It is noteworthy that the misspecifica- tion of not letting x have a direct effect on the growth factors cannot be discovered using LCGA. Note that in the last two analyses, the entropy values are strongly overestimated, 0.80 and 0.85. It is also likely that more than two classes are needed to account for the within- class variation. This implies that some of the classes

are merely slight variations on a theme and do not have a substantial meaning.

19.3.4.2. Equivalent Models

With latent variable models in general and mixture models in particular, the phenomenon of equivalent models may be encountered. Here, equivalent models means that two or more models fit the same data approximately the same so that there is no statistical basis on which to base a model choice. Consider two psychometric examples. First, in exploratory factor analysis, a rotated solution using uncorrelated factors gives the same estimated correlation matrix as a rotated solution with correlated factors. Second, Bartholomew and Knott (1999, pp. 154–155) point out a well-known psychometric fact that a covariance matrix generated by a latent profile model (a latent class model with continuous outcomes) can be perfectly fitted by a factor analysis model. A covariance matrix from a k-class model can be fitted by a factor analysis model with k − 1 factors. Molenaar and von Eye (1994) show that a covariance matrix generated by a factor model can be fitted by a latent class model. This should not be seen as a problem but merely as two ways of looking at the same reality. The factor analysis informs about under- lying dimensions and how they are measured by the items, whereas the latent profile analysis sorts individ- uals into clusters of individuals who are homogeneous with respect to the item responses. The two analyses are not competing but are complementary. The issue of alternative explanations is classic in finite mixture statistics. Mixtures have two separate uses. One is to simply fit a nonnormal distribution without a particular interest in the mixture components. The other is to capture substantively meaningful sub- groups. For a historical overview, see, for instance, McLachlan and Peel (2000, pp. 14–17), who refer to a debate about blood pressure. A classic example con- cerns data from a univariate (single-class) lognormal distribution that are fitted well by a two-class model that assumes within-class normality and has different means. Bauer and Curran (2003) consider the anal- ogous multivariate case arising with growth mixture modeling. 9 The authors use a Monte Carlo simula- tion study to show that a multiclass growth mixture model can be arrived at using conventional Bayesian information criterion (BIC) approaches (see below) to determine the number of classes when data, in fact, have been generated by a nonnormal multivariate

(^9) Multivariate formulas that show equivalence are not given.

Chapter 19/Latent Variable Analysis • 357

The SK procedure needs further investigation but is offered here as an example of the many possibilities of testing a mixture model against data (see also Wang et al., 2002).

19.3.5. The LSAY Math Achievement Example

This section returns to the analysis of the mathe- matics achievement data from the LSAY data men- tioned earlier. Based on the educational literature, the following covariates are included: female; Hispanic; Black; mother’s education; home resources; the student’s educational expectations, measured in seventh grade (1 = high school only, 2 = voca- tional training, 3 = some college, 4 = bachelor’s degree, 5 = master’s degree, 6 = doctorate); the student’s thoughts of dropping out, measured in sev- enth grade; whether the student has ever been arrested, measured by seventh grade; and whether the student has ever been expelled by seventh grade. Correspond- ing to individuals with complete data on the covariates, the analyses consider a subsample of 2,757 of the total 3,116 individuals. The analyses were carried out by maximum likelihood estimation using Mplus Version 2.13.

19.3.5.1. Statistical Checking

The univariate skewness and kurtosis sample values in the LSAY data are as follows:

Skewness = ( 0 .168 0.030 0. 063 − 0. 077 ), (8) Kurtosis = (− 0. 551 − 0. 338 − 0. 602 − 0. 559 ). (9)

In line with the earlier discussion of the LMR LRT, due to the low nonnormality in the outcomes, it is plausible that this test is applicable in the LSAY analysis for testing a one-class model versus more than one class. In the LSAY analysis, this test points to at least two classes with a strong rejection (p =. 0000 ) of the one-class model. The SK tests carried out on the list- wise present subsample of 1,538 reject the one-class model (p = .0000 for both multivariate skewness and multivariate kurtosis) but do not reject two classes (p = .4300 and .5800). The LMR LRT for two versus three or more classes obtained a high p-value (. 6143 ) in support of two classes. Taken together, the statistical evidence points to at least two classes. Given that the skewness and kurtosis tests found that two- and three-class GMMs fit the data, the LMR LRT is useful for testing the multiclass alternatives against each other.

19.3.5.2. Substantive Checking

and Further Statistical Analysis

This section compares analysis results using a con- ventional one-class growth model and different forms of GMMs and discusses substantive meaningfulness based on educational theory, auxiliary information, and practical usefulness. Figure 19.7 shows a diagram of the general model.

19.3.5.2.1. Conventional one-class growth

modeling. As a first step, the conventional one-class

growth model results are considered. Briefly stated, a linear growth model fits reasonably well and has a positive growth rate mean of about 1 standard deviation across the four grades. The covariates with significant influence (sign in parentheses) on the initial status are as follows: female (+), Hispanic (−), Black (−), mother’s education (+), home resources (+), expec- tations (+), dropout thoughts (−), arrest (−), and expelled (−). The covariates with significant influ- ence (sign in parentheses) on the growth rate are as follows: female (−), Hispanic (−), home resources (+), expectations (+), and expelled (−).

19.3.5.2.2. Two-class GMM. The two-class

solution is characterized by a low class of 41%, which, in comparison to the high class, has a lower initial status mean and variance, a lower growth rate mean, and a higher growth rate variance. It is interesting to consider what characterizes these students apart from their poor mathematics achievement development. The multinomial logistic regression for class membership indicates that, relative to the high class, the odds of membership in the low class are significantly increased by being male, being Hispanic, having a mother with a low level of education, having low seventh-grade educational expectations, having had seventh-grade thoughts of dropping out, having been arrested, and having been expelled. The low class appears to be a class of students with problems both in and out of school. The profile of the low class is reminiscent of individuals at risk for dropping out of high school (see, e.g., Rumberger & Larson, 1998, and references therein). Many of these students are “disengaged,” to use language from high school dropout theories. The within-class influence of the covariates on the initial status and growth rate factors varies significantly across class. The low class has no significant predic- tors of growth rate, whereas the growth rates of the two higher classes are significantly enhanced in well- known ways by being male, having a mother with a high level of education, having high home resources,

358 • SECTION V/MODELS FOR LATENT VARIABLES

Figure 19.7 GGMM Diagram for LSAY Data

High School Dropout

Female Hispanic Black Mother’s Ed. Home Res. Expectations Drop Thoughts Arrested Expelled

c

i s

Math7 Math8 Math9 Math

and having high expectations. To the extent that the low class has substantive meaning, the findings suggest that different processes are in play for students in the low class.

19.3.5.2.3. Three-class GMM including a distal

outcome. To more specifically investigate the data

from the high school dropout perspective and further characterize the low class, the distal binary outcome of dropping out of high school, as recorded in Grade 12, was added. The overall dropout rate in the sample is 14.7%, or 458 individuals. Here, class membership in the GMM is, to some extent, also determined by the Grade 12 dropout indicator and not only by the covariates and math achievement development. Adding the distal outcome, the LMR LRT rejected the two-class model in favor of at least three classes (p =. 0060 ). The three-class solution produces a more distinct low class of 19%, a middle class of 28%, and a high class of 52%. Here, the low class (estimated as 536 students) has a lower growth rate mean and lower growth rate variance than in the two-class solution without the distal outcome. 10

(^10) The Akaike information criterion (AIC) points to at least three classes, whereas the Bayesian information criterion (BIC) points to two classes. The one-class log-likelihood, number of parameters, AIC, and BIC values are as follows: − 30 , 021 .955, 27, 60, 097 .909, and 60, 257 .791. The two- class log-likelihood, number of parameters, AIC, BIC, and entropy values are as follows: − 29 , 676 .457, 63, 59, 478 .914, 59, 851 .971, and 0.552. The three-class log-likelihood, number of parameters, AIC, BIC, and entropy values are as follows: − 29 , 566 .679, 99, 59, 331 .359, 59, 917 .591, and 0 .620.

The class membership regression part of the model indicates that for the low class relative to the highest class, the same covariates as in the two-class solu- tion are significant, except that Hispanic and mother’s education are insignificant, whereas Black and home resources are significant. Interestingly, comparing the middle class to the high class, the disengagement covariates of low educational expectations, seventh- grade dropout thoughts, having been arrested, and having been expelled are no longer significant. This suggests that the low class is now a more distinct class that is more specifically characterized as disengaged and at risk for high school dropout. The two higher classes may or may not make a substantively mean- ingful distinction among students, but their presence helps to isolate the low class. In a two-class solution including the distal outcome, the low class is not very different from the more unspecific low class of the initial two-class solution without the distal outcome. It is interesting to note that although the LMR LRT does not point to three classes without the distal outcome, the three-class solution without the distal outcome shows a similar low class as in the three-class solution with the distal outcome. As will be shown next, the three-class solution with the distal outcome gets not only statistical support from the LMR LRT but also substantive support from predicting dropout. Further bolstering the notion that the low class is prone to high school dropout, the probability of drop- ping out, as estimated from the three-class model, is distinctly different in the low class. The probabilities are .692 for the low class, .076 for the middle class,

360 • SECTION V/MODELS FOR LATENT VARIABLES

growth modeling with continuous outcomes discussed in Section 19.2, the Asparouhov-Muth´en approach allows a 1 ti in (11) to be handled as data or as param- eters to be estimated. Furthermore, the π 2 ti slopes can be random for the time-varying covariates a 2 ti.^12 The Hedeker-Gibbons model is obtained as a special case with a single latent class. As in (3), the covariate effect on class membership is a multinomial logistic regression,

P (c (^) i = k|x (^) i ) =

e γ^0 k^ +γ^1 k^ x^ i ∑K s= 1 e^ γ 0 s +γ 1 s x (^) i.^ (13)

The growth mixture extension of (10) is

P (u ti = 1 |a 1 ti , a 2 ti , xi , ci = k)

=

1 + e τ^ − logit (u tik^ )^

where the added conditioning on c and the subscript k emphasize that the growth model for u, as expressed by the logits, varies across classes. In line with the extension for continuous outcomes, the different latent classes have different growth models (11) and (12), with key differences typically found in the β coef- ficients but also in the (co)variances of the Level 2 residuals r. Typically, the thresholds τ would be time and class invariant to represent measurement invari- ance, although class invariance is not necessary. Generalizations to including distal outcomes ud , as in (15), is of interest also here:

P (u di = 1 |c (^) i = k, xi ) =

1 + e τ^ k^ −κ^ k^ x^ i

with coefficients varying across classes k. Model building and testing strategies for categorical outcomes are in line with those discussed earlier for continuous outcomes.

19.5.1. Categorical Outcomes:

Latent Class Growth Analysis

Latent class growth analysis (LCGA) for categorical outcomes considers the model in (11) through (13) with the restriction of zero variances and covariances for the residuals r. Background references for LCGA include Nagin (1999), Nagin and Land (1993), and Nagin and Tremblay (2001).

(^12) Threshold parameters are useful with ordered polytomous outcomes, in which case β 00 can be fixed at zero, or, alternatively, the first threshold is fixed at zero.

It is instructive to relate LCGA to latent class analysis (LCA). As in LCGA, LCA considers mul- tiple u variables seen as indicators of c and assumed conditionally independent given c. As in LCGA, there are no continuous latent variables to explain fur- ther within-class correlation among the u variables. Typically, all outcomes are categorical. Continuous outcomes are, however, possible, giving rise to latent profile analysis. In LCA, the multiple indicators are cross-sectional measures, not longitudinal. When the multiple indicators correspond to repeated measures over time, latent classes may correspond to different trends, and trend structures can be imposed across the indicators’ probabilities. To clarify this, consider again (14):

P (u ti = 1 |a 1 ti , a 2 ti , xi , ci = k) =

1 + e τ^ − logit (u tik^ )^

This means that with, for example, linear growth over T time points, the probabilities of the T u variables are structured according to a logit-linear trend, where the intercept and slope factors have dif- ferent means across the classes. Note here that τ is held equal across time points. In contrast, LCA considers

P (u ti = 1 |x (^) i , ci = k) =

1 + e τ tk

where the τ tk thresholds vary in an unrestricted fashion across the u variables and across the classes. In this way, LCGA gives a more parsimonious description of longitudinal data than LCA. Models with more than one latent class variable are also of interest. Examples of LCGA with multiple- class variables are given in Muth´en and Muth´en (2000), Muth´en (2001a), and Nagin and Tremblay (2001). In this connection, it is useful to consider another impor- tant class of growth models, latent transition analysis (LTA). LTA uses time-specific latent class variables measured by multiple indicators at each time point to study class membership change over time. Both LCA and LTA can be generalized to include random effects as in growth mixture modeling (Asparouhov & Muth´en, 2003b). All of these model variations can be captured in a general latent variable modeling framework and are included in Mplus.

19.5.2. Categorical Outcomes: Comparing

LCGA and GMM on Delinquency Data

Nagin and Land (1993), Nagin (1999), Roeder et al. (1999), and Jones et al. (2001) used PROC TRAJ

Chapter 19/Latent Variable Analysis • 361

Figure 19.8 Frequency Distributions for Cambridge Data

393

(^7 2 ) 0

100

200

300

400

Frequency

0 1 2 3 4 5 6 7 8 9

Ages 10 and 11

336

46 (^9 5 2 2 1 1 ) 0

100

200

300

400

0 1 2 3 4 5 6 7 8 9

Ages 18 and 19

376

(^19 5 1 ) 0

100

200

300

400

Frequency

0 1 2 3 4 5 6 7 8 9 Number of Convictions

Ages 24 and 25 379

(^21 1 ) 0

100

200

300

400

0 1 2 3 4 5 6 7 8 9 Number of Convictions

Ages 30 and 31

LCGA to study the development of delinquency over ages 10 to 32 in a sample of 411 boys in a working-class section of London (Farrington & West, 1990). These “Cambridge data” were studied from the substantive perspective of the Moffitt (1993) theory of adolescent- limited versus life course–persistent antisocial behav- ior. This theory suggests two major trajectory classes. Using different ways to aggregate and model the out- comes, Nagin and Land found four classes, Nagin three classes, Roeder et al. four classes, and Jones et al. three classes. The Nagin (1999) approach of consider- ing 2-year intervals and excluding the 8 boys who died during the study will be used here, resulting in 11 time points and n = 403. The frequency distributions are shown in Figure 19.8. Only ages 11 to 21 will be used here. Given that few individuals have more than two con- victions in the 2-year interval, data will be coded as 0 , 1, and 2 for zero, one, or more convictions in the last 2 years; 65% have 0 value at all 11 time points. A logis- tic ordered polytomous response model will be used, and three types of analyses will be illustrated: latent

class growth analysis, conventional growth modeling, and growth mixture modeling. The analyses draw on Muth´en, Kreuter, and Asparouhov (2003).

19.5.2.1. Latent Class Growth

Analysis of the Cambridge Data

Latent class growth analysis was performed with two, three, and four classes applying a quadratic growth curve for all classes. The corresponding BIC values were 2, 230 .014, 2, 215 .251, and 2, 227 .976. This points to the three-class model as being the best. This model has a log-likelihood value of − 1 , 071 .632, 12 parameters, and an entropy of 0.821. The estimated class percentages are 3%, 21%, and 75%, arranging the curves from high to low. The LMR LRT also points to three classes in that the test of the two-class model against the three-class model has a p-value of .0030, suggesting rejection, whereas the three-class model tested against the four-class model has a p-value of .1554. The estimated three-class growth curves for the

Chapter 19/Latent Variable Analysis • 363

Figure 19.10 (^) Three-Class LCGA for Cambridge Data

Age

Probability of Conviction

0.8 Class 1^ (3%) Class 2 (50%) Class 3 (47%)

low. The intercept variance is nonsignificant for the highest class but significant for the middle class. 15 An interesting finding is that this three-class GMM, which allows within-class variation, has 1 parameter less than the three-class LCGA but a better fit in terms of log-likelihood and BIC values. The zero class is smaller in the GMM than in the LCGA, 47% ver- sus 75%. The fact that 64% of the individuals have observed values at zero throughout, whereas the GMM zero class has only 47% prevalence, is due to the fact that the individuals who are most likely to be in the low class according to the posterior probabilities have a sizable probability of being in the middle class. The estimated three-class growth curves for the probabil- ity of having at least one conviction are shown in Figure 19.10. These curves are clearly different from the LCGA curves in Figure 19.9, with Class 1 and Class 2 peaking at different ages for GMM but not for LCGA.

(^15) The Mplus input and output for these analyses are given in Example 8 at www.statmodel.com/mplus/examples/penn.html. This analysis was carried out by Mplus Version 3.

This may lead to different substantive interpretations in the context of Moffitt’s (1993) theory.

19.5.3. Categorical Outcomes:

Discrete-Time Survival Analysis

Discrete-time survival analysis (DTSA) uses the categorical variables u to represent events modeled by a logistic hazard function (cf. Muth´en & Masyn, in press). For an overview of conventional DTSA, see, for example, Singer and Willet (1993). Consider a set of binary 0/1 variables u (^) j , j = 1 , 2 ,... , r, where u ij = 1 if individual i experiences the nonrepeatable event in time period j, and define ji as the last time period in which data were collected for individual i. The hazard is the probability of experiencing the event in time period j given that it was not experienced prior to j. The hazard is written as

h ij =

1 + e−(−τ^ j^ +κ^ j^ x^ i^ )^

364 • SECTION V/MODELS FOR LATENT VARIABLES

where a proportional-odds assumption is obtained by dropping the j subscript for κ (^) j. Discrete-time survival analysis is fitted into the general mixture model above by noting that the likelihood is the same as for u related to c and x in a single-class model. The fact that individual i does not have observations on u after time period ji is handled as missing data. For example, with five time periods (r = 5 ), an individual who experiences the event in Period 4 has the data vector u ′i

( 0 0 0 1 999 ),

with 999 representing missing data. An individual who is censored in Period 5 has the data vector u ′i

( 0 0 0 0 0 ),

whereas an individual who is censored in Period 4 has the data vector u ′i

( 0 0 0 999 999 ).

Muth´en and Masyn (in press) also propose general discrete-time survival mixture analysis (DTSMA) models, in which different latent classes have different hazard and survival functions. For example, a growth mixture model for y can be combined with a survival model for u.

19.6. Combination of

Categorical and Continuous

Outcomes: Modeling With Zeros

In the previous section, it was seen that the u variables need not represent conventional categorical outcomes but can be used as indicators of events. In this section, this idea is taken further by using the u variables as indicators of zero values on a continuous and on a count outcome variable. Growth mixture modeling is useful for describing growth in outcomes that can be seen as continuous but nonnormally distributed. A type of nonnormality that cannot be well captured by mixtures of normal distributions arises in studies in which a significant number of individuals are at the lowest value of an out- come, for example, representing absence of a behavior. Applications include alcohol, drug, and tobacco use among adolescents. Censored-normal models are often used for outcomes of this kind, including classic Tobit regression analysis (Amemiya, 1985; Tobin, 1958) and LCGA in the PROC TRAJ program (Jones et al., 2001).

A recent article by Olsen and Schafer (2001) gives an excellent overview of several related modeling efforts. Censored-normal models have been criticized (see, e.g., Duan, Manning, Morris, & Newhouse, 1983) because of the limitation of assuming that the same set of covariates influences both the decision to engage in the behavior and the amount observed. A two-part modeling approach proposed in Olsen and Schafer avoids this limitation. To simplify the discussion, the lowest value will be taken to be zero. It is useful to distinguish between two kinds of zero outcomes. First, individuals may have zero values at a given time point because their behavioral activity is low and is zero during certain periods (“random zeros”). Second, individuals may not engage in the activity at all and therefore have zeros throughout all time points of the study (“structural zeros”). Olsen and Schafer (2001) proposed a two-part model for the case of random zeros, whereas Carlin, Wolfe, Brown, and Gelman (2001) considered the case of structural zeros. In both articles, a random-effects logistic regression was used to express the probabilities of nonzeros versus zeros. Olsen and Schafer (2001) studied alcohol use in Grades 7 through 11. To capture the changing zero status across time, they expressed the logistic regres- sions for each time point as a random-effects growth model. The term two-part model refers to having both a logistic model part to model the probability of nonzero versus zero outcomes (Part 1) and a continuous-normal or lognormal model part for the values of the nonzero outcomes (Part 2). In Olsen and Schafer, the two parts have correlated random effects. The two parts are also allowed to have different covariates, avoiding the limitation of censored-normal modeling. Carlin et al. (2001) studied cigarette smoking among adolescents. A two-class model was used with a “zero class” (structural zeros) representing individuals not susceptible to regular smoking (also referred to as “immunes”). As pointed out in Carlin et al., an individual with zeros throughout the study does not necessarily belong to the zero class but may show zeros by chance. In their analysis, the estimated proportion of immunes was 69%, whereas the empirical propor- tion with all zeros was 77%. Because of this, an ad hoc analysis based on deleting individuals with all zeros may lead to distorted results. Inspired by Olsen and Schafer (2001) and Carlin et al. (2001), Muth´en (2001b) proposed a generali- zation of growth mixture modeling to handle both random and structural zeros in a two-part model. Multiple latent classes are used to represent the growth in the probability of nonzero values in Part 1 as well as