Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Questions with Answer Key for Applied Regression Analysis | STAT 51200, Exams of Statistics

Material Type: Exam; Professor: Zhang; Class: Applied Regression Analysis; Subject: STAT-Statistics; University: Purdue University - Main Campus; Term: Spring 2009;

Typology: Exams

Pre 2010

Uploaded on 07/30/2009

koofers-user-tz9
koofers-user-tz9 🇺🇸

10 documents

1 / 2

Toggle sidebar

Related documents


Partial preview of the text

Download Questions with Answer Key for Applied Regression Analysis | STAT 51200 and more Exams Statistics in PDF only on Docsity!

  1. Short Answer

a. List the three major assumptions of a simple linear regression model. For each assumption, give one

way that you might check whether that assumption is satisfied and identify one remedy that might be

used to adjust for the problem if it exists.

Assumption #1: Normality of Error Terms

Diagnostic: Examine a Normal QQ plot of the residuals for linearity; Examine a histogram;

Shapiro-Wilks Test

Remedy: Transformation on Y (may use Box-Cox to find one)

Assumption #2: Constancy of Variance (among residuals)

Diagnostic: Examine residual plots (vs. Y-hat or X’s) for different “vertical spreads”; Modified

Levine or Breusch-Pegan Tests

Remedy: Transformations on Y may help or use Weighted Least Sqauares

Assumption #3: Independence of Error Terms (you can also say Linearity)

Diagnostic: Plot the residuals against time or sequence

Remedy: Account for dependency in the model (perhaps by including time or see Ch12)

b. (10 points) You are designing an experiment in which you want to relate corn yield (Y, in bushels per

acre) to the amount of fertilizer used on the plot (X, in pounds per acre). Based on previous

experiments, you believe that the ideal fertilizer amount is between 10 and 12 pounds per acre. You

would like to obtain prediction intervals at X = 10.5, X = 11, and X = 11.5. (a) In terms of

experimental design, explain what you could do to try to minimize the widths of these intervals. (b)

Explain what you would do to obtain these three intervals with a family confidence level of 95%.

For part (a), note that  

 

 

2

2

(^2 ) 1

h

i

X X

n X X

s pred MSE

 

  

 

 

. So there are three things we can do: (1) Add

observations to increase n, (2) Make (^) X  11 so that the third term has a small numerator, and (3)

Increase the spread in the X’s so that SSX in the denominator is increased.

For part (b), we would want to use a Bonferroni correction, taking  0.5 / 3^ 0.^.

  1. Omit
  2. Refer to the SAS output marked OUTPUT FOR PROBLEM 2. The data are from a study of company

executives. The response variable is annual salary (dollars), and the two explanatory variables used are gender

(0 = female, 1 = male) and exper (experience in years). The variable expgen is the product of gender and

exper.

a. Write down the linear model used in this analysis, including the distributional assumption.

         , where , 

i 0 1 i1 2 i2 3 i3 i 1 2 3 1 2

Y X X X X =gender X =exper ,X X X

   , 

2

i

iid N 0

b. Write the estimated regression equation (2 pts). Then, write two separate fitted lines predicting salary

from exper: one for females (2 pts) and one for males (2 pts).

(5)

   

ˆ

ˆ

females:

ˆ

males:

   

 

     

1 2 1 2

2

2 2

Y 58050 7799X 2045X 864X X

Y 58050 2045X

Y 58050 7799 2045 864 X 65849 2909X

c. Estimate the mean salary for women with 5 years of experience.

ˆ

Y  58050  2045  5  58050  10225  68275

d. How do you justify that either gender or experience or their product, or any combination of those three

variables is useful to predict salary?

The F test in ANOVA table gives F value = 98.09 and p-value <0.0001, so we conclude that at least one of

those three variables has linear relationship with Salary.

e. Does residual show anything strange? Explain why or why not.

The residuals appear to be normal from the normal quantile plot, the variance seems to be constant from the

residual scatter plots, and the residuals are approximately equally distributed above and below the zero-line. So

nothing seems strange with residuals.

  1. Suppose we have an incomplete ANOVA table in studying the linear model of predicting Salary by quality,

experience and publication.

a. Complete the ANOVA table.

Analysis of Variance

Sum of Mean

Source DF Squares Square F Value Pr > F

Model 3 627.81700 209.27233 68.12 <.

Error 20 61.44300 3.

Corrected Total 23 689.

b. Write down the linear model and the hypothesis for the F test. What is your conclusion from this test?

The conclusion from this test is that at least one of the three variables: quality, experience and publication has

linear relationship with salary.

Suppose that we miss the last three columns, t Value, Pr > |t|, 95% Confidence Limits, in the following table.

c. Fill in the blanks for experience based on the information from the ANOVA table and the above table.

Parameter Estimates

Parameter Standard

Variable DF Estimate Error t Value Pr > |t| 95% Confidence Limits

Intercept 1 17.84693 2.00188 8.92 <.0001 13.67109 22.

quality 1 1.10313 0.32957 3.35 0.0032 0.41565 1.

experience 1 0.32152 0.03711 8.66 <.0001 0.24411 0.

publications 1 1.28894 0.29848 4.32 0.0003 0.66632 1.

d. Write down the hypothesis for the t test in c. What is your conclusion from this test?

The conclusion from t test is that given quality and publications included in the model, the experience is still

significantly useful to predict salary.