Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Homework 5 with 3 Practice Problems on Applied Linear Regression | 22S 152, Assignments of Statistics

Material Type: Assignment; Professor: DeCook; Class: 22S - Applied Linear Regression; Subject: Statistics and Actuarial Science; University: University of Iowa; Term: Unknown 1989;

Typology: Assignments

Pre 2010

Uploaded on 03/11/2009

koofers-user-c0s
koofers-user-c0s 🇺🇸

10 documents

1 / 2

Toggle sidebar

Related documents


Partial preview of the text

Download Homework 5 with 3 Practice Problems on Applied Linear Regression | 22S 152 and more Assignments Statistics in PDF only on Docsity!

22s: Homework 5

Assigned Wednesday, October 8 Due Wednesday, October 15 at classtime

Dummy variables, One-way ANOVA

Turn-in homework with hand-written or typed responses and include any relevant plots which you are describing.

  1. The Angell data set in the car library has information on moral integration of U.S cities. The variables in the data set include three quantitative variables: moral, hetero, mobility, and one categorical variable: region.

(a) Plot scatterplots for the bivariate plots of the quantitative variables, and give the correlation matrix. Here, moral is the dependent variable in our model. Com- ment on the relationship between the quantitative predictors and the dependent variable (positive or negative, strong or weak), and on the relationship between the two quantitative predictors.

(b) Construct dummy variables to represent the four regions East, Midwest, West, and South (use South as the baseline group). Regress moral on the three other predictors in the data set. Provide the ‘summary’ output from the model fit.

(c) Perform a Partial F-test to see if region is a significant predictor of moral given hetero and mobility has been accounted for. Give the relevant findings of the test.

  1. Pulmonary function data was collected on both smokers and non-smokers. The volume of air expelled after one second of constant effort was recorded, and the individual’s age was also recorded. A model was fitted to the data using Volume as the response, and Age and smoking category as predictors.

A smoking dummy variable Di was created with non-smoker=0 and smoker=1.

Consider the common-slope model (this is an additive model):

Yi = β 0 + βagexi + βDDi + i

The parameter estimates from the fitted model are: βˆ 0 = 0. 3673 βˆage = 0. 2306 βˆD = − 0. 2090

(a) What is the fitted model for smokers? (provide it using the values above) (b) What is the fitted model for non-smokers? (c) Based on the fitted values, does it look like smokers or non-smokers have a higher expelled volume at every age? On what did you base your decision? (d) The individuals in this study ranged in age from 6-22 years old. Does the volume of expelled air go up or down as one gets older for this age group? On what did you base your decision?

  1. Perform a one-way ANOVA of geographic mobility by region, using Angell’s data on 43 U.S. cities (same data set as in problem 1).

(a) Find the means of the 4 groups, and draw parallel boxplots of mobility by region. Perform Levene’s test for constant variance. > levene.test(mobility, region) Comment on the plot, the test, and provide the plot, as well. (b) In the effects model for ANOVA, we have mentioned that we have an over- parameterization, and we need to impose a constraint on the parameters before we do our estimation. The constraint chosen impacts the interpretation of the parameters.

I want you to confirm that identical sums of squares are produced by the following three computational methods. For each, provide the coding you used and the RegSS and the RSS. (i) Set α 4 = 0 (note that setting α 4 = 0 is the same as using South as the baseline group, and you did this already in problem 1). Set up your dummy variables accordingly, and fit the 1-way ANOVA model using the lm() function. (ii) Set α 4 = −(α 1 + α 2 + α 3 ). You’ll use -1, 0, 1 coding for your dummy vari- ables. p.145/146 shows an example of this often used coding system called deviation regressors. Set up your dummy variables accordingly, and fit the 1-way ANOVA model using the lm() function. (iii) Use the formulas in the first column of Table 8.1 on p. 147 to calculate the sums of squares directly.

(c) Summarize the sums of squares in an ANOVA table like the bottom of p. 148. (d) Test the significance of region in the 1-way ANOVA model using the following:

lm.out = lm(mobility ∼ region) anova(lm.out)