Statistics Problem Set 6: Regression Analysis with SAS - Prof. Kristofer Jennings, Assignments of Statistics

Instructions for completing problem set 6 in a statistics 512 course, which involves using sas software to perform regression analysis on given data sets. The problems cover creating a new variable, running a regression, selecting the best subset of variables using the cp criterion, checking assumptions, and comparing regression lines for two different populations.

Typology: Assignments

Pre 2010

Uploaded on 07/31/2009

koofers-user-ohb
koofers-user-ohb 🇺🇸

9 documents

1 / 1

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Statistics 512: Problem Set No. 6
Due October 17, 2008
For the following 3 problems use the computer science data that we have been dis-
cussing in class. You can get a copy of the data set csdata.dat from the class website.
The variables are: id, a numerical identifier for each student; GPA, the grade point
average after three semesters; HSM;HSS;HSE;SATM;SATV, which were all explained in
class; and GENDER, coded as 1 for men and 2 for women.
1. In a data step, create a new variable GENDERW that has values 1 for women and 0 for men
(use arithmetic on the original variable GENDER). Run a regression to predict GPA using the
explanatory variables HSM,HSS,HSE,SATM,SATV,andGENDERW. (Do not include any interaction
terms.)
(a) Give the equation of the fitted regression line using all six explanatory variables.
(b) Give the fitted regression line for women (use part a).
(c) Give the fitted regression line for men (use part a).
DO NOT attempt to run proc reg on a subset of the data to answer this question.
2. Use the Cpcriterion to select the best subset of variables for this problem (i.e. use the options
/ selection = cp b;”) . Use only the original six explanatory variables, not HS or SAT,
and use either GENDER or GENDERW, not both. Summarize the results and explain your choice
of the best model.
3. Check the assumptions of this “best” model using all the usual plots (you know what they
are by now). Explain in detail whether or not each assumption appears to be substantially
violated.
A testing laboratory with equipemtn that simulates highway driving studies for two
makes (A, B)of a certain type of truck tire the relation between operating cost per
mile (Y) and cruising speed (X1). The observations are given in CH11PR15.DAT,where
the columns are ordered (Yi,X
i,1,X
i,2)where Xi,2=1for Make A and Xi,2=0for Make
B. An engineer now wishes to decide whether or not the regression of operating cost
on cruising speed is the same for the two makes of tires. Assume the error variances
for the two makes are the same and that an interaction-effect regression model is
appropriate.
4. Plot the data for the two populations on the same graph, using different symbols (v=)and
lines. Does the relationship between speed and operating cost appear to be the same for the
two makes of tire?
5. Examine the question of whether or not the two lines are the same. Write a model that allows
the two makes of tires to have different intercepts and slopes. Then, perform the general linear
test to determine whether the two lines are equal. State the null and alternative hypothesis,
the test statistic with degrees of freedom, the p-value and your conclusion.
6. Using the model that fits two different lines, give a 95% confidence interval for the difference
in slopes. (Hint: what parameter represents the difference between the slopes?)
1

Partial preview of the text

Download Statistics Problem Set 6: Regression Analysis with SAS - Prof. Kristofer Jennings and more Assignments Statistics in PDF only on Docsity!

Statistics 512: Problem Set No. 6 Due October 17, 2008

For the following 3 problems use the computer science data that we have been dis- cussing in class. You can get a copy of the data set csdata.dat from the class website. The variables are: id, a numerical identifier for each student; GPA, the grade point average after three semesters; HSM; HSS; HSE; SATM; SATV, which were all explained in class; and GENDER, coded as 1 for men and 2 for women.

  1. In a data step, create a new variable GENDERW that has values 1 for women and 0 for men (use arithmetic on the original variable GENDER). Run a regression to predict GPA using the explanatory variables HSM, HSS, HSE, SATM, SATV, and GENDERW. (Do not include any interaction terms.)

(a) Give the equation of the fitted regression line using all six explanatory variables. (b) Give the fitted regression line for women (use part a). (c) Give the fitted regression line for men (use part a).

DO NOT attempt to run proc reg on a subset of the data to answer this question.

  1. Use the Cp criterion to select the best subset of variables for this problem (i.e. use the options “ / selection = cp b;”). Use only the original six explanatory variables, not HS or SAT, and use either GENDER or GENDERW, not both. Summarize the results and explain your choice of the best model.
  2. Check the assumptions of this “best” model using all the usual plots (you know what they are by now). Explain in detail whether or not each assumption appears to be substantially violated.

A testing laboratory with equipemtn that simulates highway driving studies for two makes (A, B) of a certain type of truck tire the relation between operating cost per mile (Y ) and cruising speed (X 1 ). The observations are given in CH11PR15.DAT, where the columns are ordered (Yi, Xi, 1 , Xi, 2 ) where Xi, 2 = 1 for Make A and Xi, 2 = 0 for Make B. An engineer now wishes to decide whether or not the regression of operating cost on cruising speed is the same for the two makes of tires. Assume the error variances for the two makes are the same and that an interaction-effect regression model is appropriate.

  1. Plot the data for the two populations on the same graph, using different symbols (v=) and lines. Does the relationship between speed and operating cost appear to be the same for the two makes of tire?
  2. Examine the question of whether or not the two lines are the same. Write a model that allows the two makes of tires to have different intercepts and slopes. Then, perform the general linear test to determine whether the two lines are equal. State the null and alternative hypothesis, the test statistic with degrees of freedom, the p-value and your conclusion.
  3. Using the model that fits two different lines, give a 95% confidence interval for the difference in slopes. (Hint: what parameter represents the difference between the slopes?)