PROC MIXED for Analyzing Cohort-Sequential Designs, Summaries of Design

characteristics relevant to the period in time in which they are being studied; cross sectional designs potentially confound age differences with cohort ...

Typology: Summaries

2022/2023

Uploaded on 02/28/2023

lalitlallit
lalitlallit 🇺🇸

4.1

(10)

226 documents

1 / 11

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
The
use
of
PROC MIXED for Analyzing Cohort-Sequential Designs
C.
Nathan Marti, The University a/Texas, Austin,
TX
Introduction
Cohort-sequential designs offer an efficient method for measuring change across time.
Longitudinal designs are useful for tracking change
in
the same individual, however they require
substantial amounts
of
time to complete the study. Cross-sectional designs allow researchers to compare
several age groups at the same time and thus measure change across time, but as this is a between-subjects
comparison, it is not possible to measure individual change. Both longitudinal and cross-sectional designs
suffer form potential cohort effects: longitudinal designs are limited to a single cohort that may have unique
characteristics relevant to the period in time in which they are being studied; cross sectional designs
potentially confound age differences with cohort differences where cohort differences may be a result
of
a
unique environmental factor that onJy affects a particular age group. Cross-sequential designs measure
different age groups across multiple time points.
By
doing so, a change across time can
be
analyzed for a
much greater time period than is required to compete a study. For instance, the example used
in
the present
paper follow children that
are
age nine, ten, eleven, and twelve at wave
one
of
a study for three years, thus
measuring change across time for a seven year period in a span
of
three years.
A single dataset (Hetherington, ClingempeeJ, Anderson, Deal, Lindner, & Stanley-Hagan, 1992) is
used to illustrate the use
ofPROC
MIXED for cohort-sequential designs. The study examined children's'
involvement in mutual activities with their mothers between the years
of
1984 and 1986. Children ranged
from nine to thirteen years
of
age
at
the onset
of
the study and were measured
at
three
yearly intervals.
While there are several advantages
of
cohort-sequential designs, they present difficulties for
ordinary least squares (OLS) analyses. Longitudinal designs can
be
analyzed as repeated measurements
with time points treated
as
within-subjects factors and cross-sectional designs
can
be
analyzed as a factorial
analysis
of
variance where groups
at
different ages are treated as between-subject factors. However, neither
of
these approaches is appropriate for the analysis
of
cohort-sequential designs as there are potentially both
within
"and
between-subjects factors,
but
no subject has complete data
at
all levels.
For
instance
in
the
present example, responses are measured
at
seven different ages,
but
each individual is onJymeasured
on
247
pf3
pf4
pf5
pf8
pfa

Partial preview of the text

Download PROC MIXED for Analyzing Cohort-Sequential Designs and more Summaries Design in PDF only on Docsity!

The use of PROC MIXED for Analyzing Cohort-Sequential Designs

C. Nathan Marti, The University a/Texas, Austin, TX

Introduction Cohort-sequential designs offer an efficient method for measuring change across time. Longitudinal designs are useful for tracking change in the same individual, however they require substantial amounts of time to complete the study. Cross-sectional designs allow researchers to compare several age groups at the same time and thus measure change across time, but as this is a between-subjects comparison, it is not possible to measure individual change. Both longitudinal and cross-sectional designs suffer form potential cohort effects: longitudinal designs are limited to a single cohort that may have unique characteristics relevant to the period in time in which they are being studied; cross sectional designs potentially confound age differences with cohort differences where cohort differences may be a result of a unique environmental factor that onJy affects a particular age group. Cross-sequential designs measure different age groups across multiple time points. By doing so, a change across time can be analyzed for a much greater time period than is required to compete a study. For instance, the example used in the present paper follow children that are age nine, ten, eleven, and twelve at wave one of a study for three years, thus measuring change across time for a seven year period in a span of three years. A single dataset (Hetherington, ClingempeeJ, Anderson, Deal, Lindner, & Stanley-Hagan, 1992) is used to illustrate the use ofPROC MIXED for cohort-sequential designs. The study examined children's' involvement in mutual activities with their mothers between the years of 1984 and 1986. Children ranged from nine to thirteen years of age at the onset of the study and were measured at three yearly intervals. While there are several advantages of cohort-sequential designs, they present difficulties for ordinary least squares (OLS) analyses. Longitudinal designs can be analyzed as repeated measurements with time points treated as within-subjects factors and cross-sectional designs can be analyzed as a factorial analysis of variance where groups at different ages are treated as between-subject factors. However, neither of these approaches is appropriate for the analysis of cohort-sequential designs as there are potentially both within "and between-subjects factors, but no subject has complete data at all levels. For instance in the present example, responses are measured at seven different ages, but each individual is onJymeasured on

three occasions and therefore has missing data for the other four time points. In fact, some comparisons confound between and within-subjects comparisons: take for instance a comparison between nine and twelve-year-olds in a study with three yearly waves. In this comparison, the subjects that began the study as nine-year-olds would be compared with twelve-year-olds, but the participants that began the study as ten- year-olds would be compared with themselves at twelve years of age and children from other cohorts at twelve years of age. Thus, the line between within and between-subjects comparisons is blurred in the cohort-sequential designs. Historically, researchers have analyzed cohort sequential designs longitudinally (Anderson, 1995). This is likely a result of the fact that cases with missing data are dropped in OLS analyses and, as all cases in a cohort-sequential design have missing data if it were assumed that they have values for all time points, all cases would be dropped. To analyze cohort-sequential designs longitudinally, researchers essentially collapse all age groups within each wave of the study. For example, in the example data used here, this would be a comparison between three waves in which the average age at wave one is eleven years of age, twelve years of age at wave two, and thirteen years of age at time three. Considering the range of ages at each of these time points, it is apparent that there is a large amount of information that is being lost by this comparison. For example, at wave one, the range of ages is between nine and thirteen years of age. Given the disadvantages of OLS approaches, it is prudent to consider what a more ideal form of analysis may be. PROC MIXED ability to handle missing data makes it an ideal procedure for analyzing planned patterns of missing data as will be described in this paper.

Preparing Data for Analysis Using PROC MIXED Using PROC MIXED to analyze cohort-sequential designs is largely a result of creating an appropriately structured dataset. Before using PROC MIXED there are some common general issues to consider and as well as issues specific to analyzing cohort-sequential designs. A common general consideration is that datasets are often organized in a multivariate format such that there is a single line for each case and a column for each data point. PROC MIXED requires that data be organized in a univariate format so that there is a single row for each measurement occasion. In addition to the general dataset requirements of PROC MIXED, cohort-sequential designs require that there is a variable for every possible

END;

RUN;DROP^ age^ eafl^ eaf2^ eaf3;

In the syntax shown above, seven variables are created, age09, agelO, agell, agel2, agel3, age14, and age15. The values of these variables are assigned with regard to the cohort to which a participant belongs. For example, a participant who was nine years of age at the beginning of the study has a value of9 for the variable age, and therefore the first IF-THEN statement is used to calculate the dependent variables. Thus, such a participant would be assigned the value of the dependent measure on the first wave for the age09 variable, the value of the second wave for the agelO variable, and the value of the third wave for the agell variable. Values for age12, age13, age14, and agel5 are missing as a participant who was nine years of age at the beginning of the study would be age eleven at the end of the study and therefore would not have data points for ages twelve through fifteen. The dataset with the new variables calculated is shown below.

(^54) ~"""~.".~ (^) '""~,~'"~~,, (^52 59) ____ ~""k"'~. "'~""""'."'~o' ~"c-, "~'" '"""" ___ ¥' ,,'"~

,. __ ~ __ ~""~,o^68 __

55 "",, __ '"'H" __'

As can be seen in the dataset above, each participant has three responses, the first of which corresponds with their age at the first wave of the study. For example, compare the first case with the previous dataseL This case was nine years of age when the study began and therefore has values for the age09, agelO, and agell variables. Following the creation of^ the variables representing all possible ages^ in^ the study, the next step in preparing the data is to transpose the data from a multivariate to univariate format. To do this, we use

PROC TRANSPOSE in the present example. The syntax below illustrates the use ofPROC TRANSPOSE to convert the data into a univariate dataset.

PROCVAR TRANSPOSE age09-age15; DATA=mixed.two OUT=mixed.two NAME=age PREFIX=score_;

RUNBY ;^ famid;

In the PROC TRANSPOSE above, the variables age09 through age15 are transposed so that they are in a single column. The NAME argument creates a new variable named age that stores the name of the variable being transposed, which in this case is the variables age09 through age15. The PREFIX argument provides a prefix for the name of transposed variables. Thus, the dataset that is created contains the identification variable,Jamid, the new variable age, which contains the names of the transposed variables

as values, and $corej, the variable containing the values of the scores. The dataset used in the present

example is shown below:

In the present example. the small F value indicates that there is not a main effect of age. Although.

there is not a main effect of age. examination of the data may indicate that there are particular ages that are different from the others. In the present example, ages nine through twelve all have similar ages, whereas the older ages show a decline in their value in the dependent variable. To analyze specific comparisons between ages, you might consider using the CONTRAST statement to construct custom hypothesis tests. The following example illustrates two uses of the contrast statement: the first compares twelve-year-olds with fourteen-year-olds. and the second compares nine through twelve-year-olds with fourteen-year-olds.

PROCCLASS MIXED famid DATA age;= Mixed.two;

MODEL score_1 = age;

REPEATEDCONTRAST age'12 versus/ SUBJECT 14' = agefamid 0 0 TYPE 0 1 0 un;-1 0;

RUN;CONTRAST^ '9-12^ versus^ 14'^ age^1 1 1^1 0 -4^ 0;

The custom contrasts can be examined in the Contrasts table in the output.

12 versus 14 158 2.44 0. 9-12 versus 14 1 158 4.07 0.

This table lists each contrast separately. The first contrast, 12 versus 14. compares mean values of twelve and fourteen-year-olds. and shows that there is not a significant difference between the two age groups. The second comparison. 9-12 versus 14. between the fourteen-year-olds and the nine through twelve-year-olds

indicates that fourteen-year-olds have a significant difference in their scores than the four ages with which they are being compared.

Individual Growth Curve Models in a Cohort-Sequential Design Individual growth curve models allow researcher to explicitly model individual growth and present many advantages to repeated measmes analyses (Bryk, & Raudenbush, 1992). To do so, a multilevel model is constructed in which time points are level-l units and individual are level-2 units. Thus, time points are nested within individuals. By constructing such a model, you can first examine the hypothesis about whether it is appropriate to use a single regression model for all subjects in you dataset. If you have a significant effect for level-2 error terms, it not appropriate ,to model yom data as a regression equation as a single intercept and slope are not sufficient for an individual who vary on these parameters. One advantage of using a multilevel model is that it includes error terms form both level, and therefore, the effects for variables are not potentially confounded with the variances due to individual's variation. The present example also illustrates the use of a continuous predictor variable, which provides output resembling a regression analysis. A critical difference between the previous example and regression analyses is that when age is treated as a categorical variable, PROC MIXED makes contrast comparisons between ages or between a group of ages and other ages as seen in the contrast example above. In contrast, when the predictor js continuous, PROe MIXED measures the change in the dependent variable that can be attributed to each unit of the independent variable. In the present example, the change would be the amount of increase or decrease in the scores measuring children's involvement that can be accounted for by their age. Prior to using proc mixed to perform a regression style analysis, the independent variable in the example dataset would need to be converted to a continuous variable. This is done using the following DATA step in which a new, numeric-formatted variable, age2 is created using the SELECT statement. DATASET mixed. mixed. three; two; SELECT ( age) ; WHENWHEN('age10') ( 'age09') age_2age_2 '== 9;10; WHEN ( 'age11') age_2 = 11;

WHENWHEN (( 'age12'age13 ')') age_2age_2 == 12;13;

-48.5677 47.l352 -1.03 0. 4.3959 3.9571 1.11 0. 125.37 10.9282 11.47 <.

The Solution for Fixed Effects table shown below contains infonnation about the effect of age on individuals' scores. The coefficients can be interpreted as a standard regression equation as there are not level-2 covariates. Examining the table, it can be seen that there was a significant effect for age in this model as the p value associated with the t value is smaller than .05.

Conclusions The ability ofPROC MIXED to handle missing data makes it an ideal procedure to analyze cohort-sequential designs which present analytic difficulties to OLS methods by forcing analysts into using between or within-subjects comparisons. By constructing a planned pattern of missing data and treating responses as repeated measurements, PROC MIXED can be used to analyze cohort-sequential designs. While the present discussion has focused on cohort-sequential designs. it also has applications to other

designs that employ repeated measurement with missing data. Most notably, it can easily be applied to longitudinal designs in which there is missing data as a result of participants failing to participate in all waves of a study or due to dropouts. Deleting these cases from analyses could bias results, as the participants that fail to participate in all measurement occasions of a study are likely to be different than those that do complete all phases of the study. Thus, employing the approach used in the present paper could potentially serve to improve analyses of longitudinal data in addition to the improvements already discussed.

References

Anderson, E. R. (1995). Accelerating and maximizing information from short-term longitudinal research. In. In J. M. Gottman (Ed.), The Analysis of Change (pp. 139-163). Mahwah, N.J.: Lawrence Erlbaum Associates. Bryk, A. S., & Raudenbush, S. W. (1992). Hierarchical Linear Models. Newbury Park, CA: Sage.

Hetherington, E. M., Clingempeel, W. G., Anderson, E. R., Deal, J. E., Lindner, M. S., & Stanley-Hagan,

M. (1992). Coping with marital transitions: A family systems perspective. Monographs of the Society for Research in Child Development, 57 (2-3, Serial No. 227). Littell, R.C., Milliken, G.A., Stroup, W.W., & Wolfinger, R.D. (1996). SAS system for mixed models. Cary, NC: SAS Institute, Inc.

Singer, J. (1998). Using SAS PROC MIXED to fit multilevel models, hierarchical models, and individual

growth models. Journal of Educational and Behavioral Statistics. 24. 323-355.