



Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
How to integrate categorical predictors into linear regression models by constructing dummy variables. The interpretation of the coefficients in a simple model with one covariate and one dummy variable, and provides an example using apple tree size and yield data with a categorical pruning method variable. The document also discusses the significance of the dummy variable as a regressor and the creation of plots to distinguish the pruning methods.
Typology: Study notes
1 / 6
This page cannot be seen from the preview
Don't miss anything!




22s:152 Applied Linear Regression Chapter 7: Dummy Variable Regression ———————————————————— So far, we’ve only considered quantitative vari- ables in our models. We can integrate categorical predictors by con- structing artificial variables (known as dummy variables or indicator variables). We’ll illustrate here with a binary predictor (e.g. Male/Female).
! !!! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! 6 8 10 12 14 16 18 20 2 4 6 8 10 Diameter Bushels There does appear to be a linear relationship between tree diameter and yield in bushels. 5 The simple linear regression:
lm.out=lm(Bushels ~ Diameter) summary(lm.out) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -2.18886 0.75898 -2.884 0.00988 ** Diameter 0.62361 0.05185 12.028 4.86e-10 ***
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 Residual standard error: 1.133 on 18 degrees of freedom Multiple R-Squared: 0.8894,Adjusted R-squared: 0. F-statistic: 144.7 on 1 and 18 DF, p-value: 4.86e-
abline(lm.out) ! !!! ! ! ! ! !! ! ! ! ! ! ! !! ! ! 6 8 10 12 14 16 18 20
2 4 6 8 10 Diameter Bushels 6 Does Pruning Method also make a significant impact on yield? First, we’ll create the dummy variable:
pruning.dummy=rep(0,nrow(botany.data)) pruning.dummy [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
pruning.dummy[Pruning=="Pyramid"]= pruning.dummy [1] 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 data.frame(Pruning,pruning.dummy) Pruning pruning.dummy 1 Pyramid 1 2 Pyramid 1 3 Pyramid 1 4 Pyramid 1 5 Pyramid 1 6 Pyramid 1 7 Pyramid 1 8 Pyramid 1 9 Pyramid 1 10 Pyramid 1 11 Pyramid 1 12 Flattop 0 13 Flattop 0 14 Flattop 0 15 Flattop 0 16 Flattop 0 17 Flattop 0 18 Flattop 0 19 Flattop 0 20 Flattop 0 Fit a model with both Diameter and Pruning (as a dummy variable). lm.out.2=lm(Bushels ~ Diameter + pruning.dummy) summary(lm.out.2) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -1.90616 0.75416 -2.528 0.0217 *
Diameter 0.63352 0.05038 12.574 4.91e-10 *** pruning.dummy -0.76259 0.49468 -1.542 0.
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 Residual standard error: 1.092 on 17 degrees of freedom Multiple R-Squared: 0.9029,Adjusted R-squared: 0. F-statistic: 79.06 on 2 and 17 DF, p-value: 2.458e-
Inclusion of Interaction Returning to the earlier model with a male/female binary variable. A slightly more complicated model: Yi = β 0 + β 1 xi + β 2 zi + β 3 xizi + "i
lm.out.3=lm(Bushels ~ Diameter + pruning.dummy + Diameter*pruning.dummy) summary(lm.out.3) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -2.00130 0.98989 -2.022 0. Diameter 0.64078 0.06988 9.170 9.05e-08 ***
pruning.dummy -0.53930 1.52761 -0.353 0. Diameter:pruning.dummy -0.01618 0.10434 -0.155 0.
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 Residual standard error: 1.124 on 16 degrees of freedom Multiple R-Squared: 0.9031,Adjusted R-squared: 0. F-statistic: 49.69 on 3 and 16 DF, p-value: 2.487e-
lm.out.3$coefficients Diameter: (Intercept) Diameter pruning.dummy pruning.dummy -2.00129614 0.64077682 -0.53929972 -0. β 0 β 1 β 2 β 3
intercept.Flattop=lm.out.2$coefficients[1] intercept.Flattop (Intercept) -1. intercept.Pyramid=lm.out.2$coefficients[1] + lm.out.2$coefficients[3] intercept.Pyramid (Intercept) -2.
slope.Flattop=lm.out.3$coefficients[2] slope.Flattop Diameter
slope.Pyramid=lm.out.3$coefficients[2] + lm.out.3$coefficients[4] slope.Pyramid Diameter
The separate fitted line for each group:
plot(Diameter,Bushels,type="n") ## Don’t plot the points points(Diameter[1:11],Bushels[1:11],pch=1,col=1) points(Diameter[12:20],Bushels[12:20],pch=9,col=4) legend(8,10,c("Pyramid","Flattop"),col=c(1,4),pch=c(1,9)) abline(intercept.Flattop,slope.Flattop,col=4) abline(intercept.Pyramid,slope.Pyramid,col=1) 6 8 10 12 14 16 18 20 2 4 6 8 10 Diameter Bushels ! !!! ! ! ! ! !! ! ! Pyramid Flattop You can’t see much difference, but the fitted line for Flattop has a slightly steeper slope. 17