Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
Material Type: Notes; Professor: Jennings; Class: Applied Regression Analysis; Subject: STAT-Statistics; University: Purdue University - Main Campus; Term: Unknown 1989;
Typology: Study notes
1 / 16
This topic will cover
Read and check the data
data interview; infile ’h:\System\Desktop\CH24TA01.DAT’; input rating officer; proc print data=interview;
Obs rating officer 1 76 1 2 65 1 3 85 1 4 74 1 5 59 2 6 75 2 7 81 2 8 67 2 9 49 3 10 63 3 11 61 3 12 46 3 13 74 4 14 71 4 15 85 4 16 89 4 17 66 5 18 84 5
19 80 5 20 79 5
title1 ’Plot of the data’; symbol1 v=circle i=none c=black; proc gplot data=interview; plot rating*officer; run;
Find and plot the means
proc means data=interview; output out=a2 mean=avrate; var rating; by officer; title1 ’Plot of the means’; symbol1 v=circle i=join c=black; proc gplot data=a2; plot avrate*officer; run;
This model is also called
Yi,j = μi + i,j
Now the μi are random variables with a common mean. The question of “are they all the same” can now be addressed by considering whether the variance of their distribution, σ^2 A, is zero. Of course, the estimated means will likely be different from each other; the question is whether the difference can be explained by error (σ^2 ) alone.
The text uses the symbol σ^2 μ instead of σ^2 A; they are the same thing. I prefer the latter nota- tion because it generalizes more easily to more than one factor, and also to the factor effects model.
Two Sources of Variation
Observations with the same i (e.g. the same interviewer) are dependent, and their covariance is σ A^2. The components of variance are σ^2 A and σ^2. We want to get an idea of the relative magnitudes of these variance components.
Same basic idea as before... μi = μ + αi. The model is Yi,j = μ + αi + i,j.
αi ∼ N (0, σ^2 A) i,j ∼ N (0, σ^2 ) Yi,j ∼ N (μ, σ A^2 + σ^2 )
The book uses σ^2 α instead of σ^2 A here. Despite the different notations, σ^2 α and σ μ^2 are really the same thing, because μi and αi differ only by an additive constant (μ), so they have the same variance. That is why in these notes I’m using the same symbol σ^2 A to refer to both. (With two factors we will have to distinguish between these.)
Parameters
There are two important parameters in these models: σ A^2 and σ^2 (also μ in the F.E.M.). The cell means μi,j are random variables, not parameters.
We are sometimes interested in estimating σ
(^2) A σ^2 A+σ^2 =^
σ^2 A σ^2 Y^. In some applications it is called the intraclass correlation coefficient. It is the correlation between two observations with the same i.
EMS and parameter estimates
E(M SE) = σ^2 as usual. We use M SE to estimate σ^2. E(M SA) = σ^2 + nσ A^2. Note that this is different from before. From this you can see that we should
use (M SA− n M SE)to estimate σ A^2.
Hypotheses
H 0 : σ A^2 = 0 H 1 : σ A^2 6 = 0
The test statistic is F = M SA/M SE with r − 1 and r(n − 1) degrees of freedom (since this ratio is 1 when the null hypothesis is true); reject when F is large, and report the p-value. Note that in the one factor analysis, the test is the same it was before. This WILL NOT be the case as we add more factors.
run proc glm with a random statement
proc glm data=interview; class officer; model rating=officer; random officer;
Sum of Source DF Squares Mean Square F Value Pr > F Model 4 1579.700000 394.925000 5.39 0. Error 15 1099.250000 73. Corrected Total 19 2678.
Random statement output
Source Type III Expected Mean Square officer Var(Error) + 4 Var(officer)
This is SAS’s way of saying E(M SA) = σ^2 + 4σ A^2 (note n = 4 replicates).
proc varcomp
This procedure gets the “variance components”.
proc varcomp data=interview; class officer; model rating=officer;
MIVQUE(0) Estimates Variance Component rating Var(officer) 80. Var(Error) 73.
(Other methods are available for estimation; mivque is the default.) SAS is now saying
Var(Error) = σˆ^2 = 73.28333 (notice this is just M SE)
Var(of f icer) = σˆ μ^2 = 80.41042 =
n
As an alternative to using proc glm with a random statement, and proc varcomp, you could instead use proc mixed, which has some options specifically for mixed models.
proc mixed
proc mixed data=interview cl; class officer; model rating=; random officer/vcorr;
Proc mixed gives a huge amount of output. Here are some pieces of it.
Covariance Parameter Estimates Cov Parm Estimate Alpha Lower Upper officer 80.4104 0.05 24.4572 1498. Residual 73.2833 0.05 39.9896 175.
The estimated intraclass correlation coefficient is σˆ
(^2) A σ ˆ A^2 +ˆσ^2 =^
σˆ A^2 σ ˆ^2 Y^ =^
Output from vcorr option
This gives the intraclass correlation coefficient.
Row Col1 Col2 Col3 Col 1 1.0000 0.5232 0.5232 0. 2 0.5232 1.0000 0.5232 0. 3 0.5232 0.5232 1.0000 0. 4 0.5232 0.5232 0.5232 1.
(^2) A+σ (^2) ) rn which may be estimated by^ s
rn. See page 1038 for derivation if you like. To get a CI we use a t critical value with r − 1 degrees of freedom.
Applications
Read and check the data
data efficiency; infile ’h:\System\Desktop\CH24PR15.DAT’; input mpg driver car; proc print data=efficiency;
Obs mpg driver car 1 25.3 1 1 2 25.2 1 1 3 28.9 1 2 4 30.0 1 2 5 24.8 1 3 6 25.1 1 3
7 28.4 1 4 8 27.9 1 4 9 27.1 1 5 10 26.6 1 5 ...
Prepare the data for a plot, and plot the data
data efficiency; set efficiency; dc = driver10 + car; title1 ’Plot of the data’; symbol1 v=circle i=none c=black; proc gplot data=efficiency; plot mpgdc;
Find and plot the means
proc means data=efficiency; output out=effout mean=avmpg; var mpg; by driver car; title1 ’Plot of the means’; symbol1 v=’A’ i=join c=black; symbol2 v=’B’ i=join c=black; symbol3 v=’C’ i=join c=black; symbol4 v=’D’ i=join c=black; symbol5 v=’E’ i=join c=black; proc gplot data=effout; plot avmpg*driver=car;
Random cell means model
Yi,j,k = μi,j + i,j,k
Dependence among the Yi,j,k can be most easily described by specifying the covariance matrix of the vector (Yi,j,k)
Random factor effects model
Yi,j,k = μ + αi + βj + (αβ)i,j + i,j,k, where
αi ∼ N (0, σ^2 A) βj ∼ N (0, σ^2 B ) (αβ)i,j ∼ N (0, σ^2 AB ) σ Y^2 = σ^2 A + σ^2 B + σ^2 AB + σ^2
Now the component σ^2 μ from the cell means model can be divided up into three components
Parameters
ANOVA Table
EM S and parameter estimates
E(M SA) = σ^2 + bnσ^2 A + nσ^2 AB E(M SB) = σ^2 + anσ^2 B + nσ AB^2 E(M SAB) = σ^2 + nσ^2 AB E(M SE) = σ^2
Estimates of the variance components can be obtained from these equations or other meth- ods.
Note the patterns in the EMS: (these hold for balanced data). They all contain σ^2. For M SA, it also contains all the σ^2 ’s that have an A in the subscript (σ A^2 and σ AB^2 ); similarly for the other M S terms. The coefficient of each term (except the first) is the product of n and all letters not repre- sented in the subscript. It is also the total number of observations at each fixed level of the level corresponding to the subscript (e.g. there are nb observations for each level of A)
Hypotheses
H 0 A : σ^2 A = 0; H 1 A : σ^2 A 6 = 0 H 0 B : σ^2 B = 0; H 1 B : σ^2 B 6 = 0 H 0 AB : σ AB^2 = 0; H 1 AB : σ^2 AB 6 = 0
Hypothesis H 0 A
Hypothesis H 0 B
Hypothesis H 0 AB
Run proc glm
proc glm data=efficiency; class driver car; model mpg=driver car drivercar; random driver car drivercar/test;
Model and error output
Sum of Source DF Squares Mean Square F Value Pr > F Model 19 377.4447500 19.8655132 113.03 <. Error 20 3.5150000 0. Corrected Total 39 380.
Factor effects output
Source DF Type I SS Mean Square F Value Pr > F driver 3 280.2847500 93. car 4 94.7135000 23. driver*car 12 2.4465000 0.2038750 1.16 0.
Only the interaction test is valid here: the test for interaction is M SAB/M SE, but the tests for main effects should be M SA/M SAB and M SB/M SAB which are done with the test statement, not /M SE as is done here. (However, if you do this the main effects are significant as shown below.)
Lesson: just because SAS spits out a p-value, doesn’t mean it is for a meaningful test!
Random statement output
Source Type III Expected Mean Square driver Var(Error) + 2 Var(drivercar) + 10 Var(driver) car Var(Error) + 2 Var(drivercar) + 8 Var(car) drivercar Var(Error) + 2 Var(drivercar)
Random/test output
The GLM Procedure Tests of Hypotheses for Random Model Analysis of Variance
Dependent Variable: mpg Source DF Type III SS Mean Square F Value Pr > F driver 3 280.284750 93.428250 458.26 <. car 4 94.713500 23.678375 116.14 <. Error 12 2.446500 0. Error: MS(driver*car)
This last line says the denominator of the F -tests is the M SAB.
Source DF Type III SS Mean Square F Value Pr > F driver*car 12 2.446500 0.203875 1.16 0. Error: MS(Error) 20 3.515000 0.
For the interaction term, this is the same test as was done above.
proc varcomp
proc varcomp data=efficiency; class driver car; model mpg=driver car drivercar; MIVQUE(0) Estimates Variance Component mpg Var(driver) 9. Var(car) 2. Var(drivercar) 0. Var(Error) 0.
Two way mixed model has
Tests
See Table 25.5 (page 1052) and below for the EM S that justify these statements. Also see Table 25.6 for the tests (page 1053).
Y , the response variable A, the fixed effect (a levels) B, the random effect (b levels) We’ll stick to balanced designs (ni,j = n)
Factor effects parameterization
Yi,j,k = μ + αi + βj + αβi,j + i,j,k
Where
i αi^ = 0,
Randomness is “catching” so the interaction between a fixed and a random effect is considered random and has a distribution. However, the interactions are also subject to constraints kind of like fixed effects. αβi,j ∼ N
0 , a− a 1 σ AB^2
subject to the constraint
i(αβ)i,j^ = 0 for each^ j.^ Because of the constraints, αβi,j having the same j (but different i) are negatively correlated, with
covariance Cov(αβi,j , αβi′,j ) = − σ AB^2 a.
Expected Mean Squares
E(M SA) = σ^2 +
nb a − 1
i
α^2 i + nσ αβ^2
E(M SB) = σ^2 + naσ β^2 E(M SAB) = σ^2 + nσ αβ^2 E(M SE) = σ^2
SAS (proc glm) writes these out for you but it uses the notation Q(A) to denote the fixed quantity (^) anb− 1
i α
2 i. It uses the names Var(Error) =^ σ
(^2) , Var(B) = σ 2 B , and Var(A^ ×^ B) = σ^2 AB. (It doesn’t actually use the names A and B; it uses the variable names.)
Looking at these EM S, we can see that different denominators will be needed to test for the various effects.
H 0 A : all αi = 0 is tested by F = (^) M SABM SA H 0 B : σ^2 B = 0 is tested by F = M SBM SE H 0 AB : σ^2 AB = 0 is tested by F = M SABM SE.
So, though it seems counterintuitive at first, the fixed effect is tested by the interaction, and the random effect is tested by the error.
(knnl1080a.sas) Y - service time for disk drives A - make of drive (fixed, with a = 3 levels) B - technician performing service (random, with b = 3 levels) The three technicians for whom we have data are selected at random from a large number of technicians who work at the company.
data service; infile ’h:\stat512\datasets\ch19pr16.dat’; input time tech make k; mt = make10+tech; proc print data=service; proc glm data=service; class make tech; model time = make tech maketech; random tech make*tech/test;
The GLM Procedure Dependent Variable: time Sum of Source DF Squares Mean Square F Value Pr > F Model 8 1268.177778 158.522222 3.05 0.
Error 36 1872.400000 52. Corrected Total 44 3140.
R-Square Coeff Var Root MSE time Mean 0.403804 12.91936 7.211873 55.
Source DF Type I SS Mean Square F Value Pr > F make 2 28.311111 14. tech 2 24.577778 12.288889 0.24 0. make*tech 4 1215.288889 303.822222 5.84 0.
We have M SA = 14.16, M SB = 12.29, M SAB = 303.82, and M SE = 52.01.
The GLM Procedure Source Type III Expected Mean Square make Var(Error) + 5 Var(maketech) + Q(make) tech Var(Error) + 5 Var(maketech) + 15 Var(tech) maketech Var(Error) + 5 Var(maketech)
To test the fixed effect make we must use the interaction: FA = M SA/M SAB = 14. 16 / 303 .82 = 0. 05... with 2,4 df (p = 0.955) To test the random effect tech and the interaction, we use error: FB = M SB/M SE = 12. 29 / 52 .01 = 0. 24... with 2, 36 df (p = 0.7908) FAB = M SAB/M SE = 303. 82 / 52 .01 = 5. 84... with 4, 36 df (p = 0.001)
The GLM Procedure Tests of Hypotheses for Mixed Model Analysis of Variance Dependent Variable: time Source DF Type III SS Mean Square F Value Pr > F make 2 28.311111 14.155556 0.05 0. tech 2 24.577778 12. Error: MS(make*tech) 4 1215.288889 303.
Source DF Type III SS Mean Square F Value Pr > F make*tech 4 1215.288889 303.822222 5.84 0. Error: MS(Error) 36 1872.400000 52.
We can have zero, one, two, or three random effects (etc) EM S indicate how to do tests In some cases the situation is complicated and we need approximations, e.g. when all are random, use M S(AB) + M S(AC) − M S(ABC) to test A.