Regression Analysis: Significance and Interpretation, Study notes of Statistics

Solutions to various regression analysis problems, including testing the significance of regression models, interpreting the results of t-tests for individual coefficients, and examining the impact of multicollinearity. The examples cover different regression models and datasets, including game attendance, salary versus experience, and crest sales versus income and advertising budget.

Typology: Study notes

2010/2011

Uploaded on 04/29/2011

rollercoaster-101
rollercoaster-101 🇨🇦

4.6

(6)

41 documents

1 / 14

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
ECON 2202 - LONG ANSWER QUESTIONS
TOPIC 6
1. .
a.
321
7360.01152.12109.25835.39
ˆxxxy
b.
1. Use F0 =MSR/MSE= (SSR/k)/(SSE/(n-k-1), α=0.05, n=20, k=3.
2. Ho: regression is not significant, Ha: regression is significant.
3. Reject Ho if F0 > Fα, =3.239
Fα, (k, n-k-1)k, n-k-1)) = F0.05, (k, n-k-1)3,1)6) = 3.239
4. F0 =MSR/MSE = 146.541
5. Since F0 > Fα (146.541> 18.513), reject Ho and conclude the regression is significant.
c. .
Use
bi
i
s
b
t
0
, α=0.05, n=20, k=3, i=1,2, 3
Ho: βi=1 Ha: βi≠1
Reject Ho if t0 > tα,/2, n-k-1,= t0.025,16 =2.1199 or t0 <- tα/2,=- 2.1199
-tα/2 =-2.1199 tα/2 =2.1199
2880.4
5156.0
2109.2
1
1
0
b
s
b
t
8091.2
3970.0
1152.1
3
2
0
b
s
b
t
6703.5
1298.0
7360.0
1
1
0
b
s
b
t
Since t0 > tα,/2 (4.2880>2.1199)
reject H0. Conclude test score
on vitality and drive matters at
the 5% significance level.
Since t0 > tα,/2 (2.8091>2.1199)
reject H0. Conclude test score
on numeric and verbal
reasoning ability matters at the
5% significance level.
Since t0 > tα,/2 (5.6703>2.1199)
reject H0. Conclude test score
on sociability and leadership
matters at the 5% significance
level.
d. Since all test scores matter and are positive, we rate the importance based on the magnitude of their
coefficients (which explain increases in salary): we rank vitality and drive first, numeric and verbal
reasoning ability second, and sociability and leadership third.
e. If the regression was not significant, we could not conclude that the independent variables explained
salaries, so it would be likely that none of the qualities measured by the tests led to higher salaries.
Econ 2202, Topic 6 Solutions – © S. Dubey, 2011 1
α=0.05
α/2=0.025
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe

Partial preview of the text

Download Regression Analysis: Significance and Interpretation and more Study notes Statistics in PDF only on Docsity!

ECON 2202 - LONG ANSWER QUESTIONS

TOPIC 6

a. 1 2 3

ˆ y  39. 5835  2. 2109 x  1. 1152 x  0. 7360 x

b.

  1. Use F 0

=MSR/MSE= (SSR/k)/(SSE/(n-k-1), α=0.05, n=20, k=3.

  1. Ho: regression is not significant, Ha: regression is significant.
  2. Reject Ho if F 0

> F

α,

F

α, (k, n-k-1)k, n-k-1))

= F

0.05, (k, n-k-1)3,1)6)

4. F

0

=MSR/MSE = 146.

  1. Since F 0

> F

α

(146.541> 18.513), reject Ho and conclude the regression is significant.

c..

Use

bi

i

s

b

t 

0

, α=0.05, n=20, k=3, i=1,2, 3

Ho: β

i

=1 Ha: β

i

Reject Ho if t 0

> t α,/2, n-k-1,

= t 0.025,

=2.1199 or t 0

<- t α/2,

-t α/

=-2.1199 t α/

1

1

0

b

s

b

t 2. 8091

3

2

0

b

s

b

t 5. 6703

1

1

0

b

s

b

t

Since t 0

> t α,/

reject H 0

. Conclude test score

on vitality and drive matters at

the 5% significance level.

Since t 0

> t α,/

reject H 0

. Conclude test score

on numeric and verbal

reasoning ability matters at the

5% significance level.

Since t 0

> t α,/

reject H 0

. Conclude test score

on sociability and leadership

matters at the 5% significance

level.

d. Since all test scores matter and are positive, we rate the importance based on the magnitude of their

coefficients (which explain increases in salary): we rank vitality and drive first, numeric and verbal

reasoning ability second, and sociability and leadership third.

e. If the regression was not significant, we could not conclude that the independent variables explained

salaries, so it would be likely that none of the qualities measured by the tests led to higher salaries.

α=0.

α/2=0.

a. (1) ε~iidN(0, σiidN(0, σ

2

); (2) regression is linear in the coefficients; (3) ε is independent of each of the

explanatory variables; and (4) explanatory variables are independent of each other. Test assumption

(2) through visual inspection of population model; or test (4) through VIF test of multicollinearity.

b. y = β 0

+ β 1

x 1

+ β 2

x 2

+ β 3

x 3

+ ε ; ŷ = -269.8384 +4.9528* x 1

  • 0.8340* x 2

  • 0.0887* x 3

For every unit increase in x1 (ads placed the previous week), calls received this week rise by 4.9528.

SUMMARY OUTPUT

Regression Statistics

R Square 0.

Adjusted R Square 0.

Standard Error 86.

Observations 12

ANOVA df SS MS F

Significance

F

Regression 3 85904.2786 28634.7595 3.8550 0.

Residual 8 59423.9714 7427.

Total 11 145328.

Coefficient

s Stand. Error t Stat P-value Lower 95%

Intercept -269.8384 216.5686 -1.2460 0.2480 -769.

x1 4.9528 20.6990 0.2393 0.8169 -42.

x2 0.8340 0.5239 1.5921 0.1500 -0.

x3 0.0887 0.0548 1.6170 0.1445 -0.

c..

  1. Use p-value (=Significance F), and α=0.
  2. Ho: β 1

= β 2

=β 3

=0, Ha: Regression Model is significant (or not all slope coefficient are zero)

  1. Reject Ho if p-value < α=0.
  2. p-value = 0.
  3. Since p-value> α (0.0564> 0.05),do not reject Ho, and conclude overall regression model is insignificant

d..

Use t= β i

/s bi

, i=1, 2, 3 and α=0.

H

o

: β 1

=0; H

a

: β 1

≠0 H

o

: β 2

=0; H

a

: β 2

≠0 H

o

: β 3

=0; H

a

: β 3

Reject Ho if t 0

> t α,/2, n-k-1,

= t 0.025,

=2.3060or t 0

<- t α/2,

-t α/

=-2.3060 t α/

t 0

= β 1

/s b

= 0.2393 t o

= β 2

/s b

= 1.5921 t o

= β 3

/s b

Don’t reject H 0

as -t α/

d..

Use p-values, α=0.10, n=10, k=3, i=1,2, 3

Ho: β

i

=1 Ha: β

i

Reject Ho if p-value<α=0.

p-value= 0.0179 p-value= 0.3836 p-value= 0.

Since p-value <α (0.0179<0.10)

reject H 0

. Conclude age

matters in explaining annual

restaurant meals at the 5%

significance level.

Since p-value>α (0.3836<0.10)

reject H 0

. Conclude income

does not matter in explaining

annual restaurant meals at the

5% significance level.

Since p-value>α (0. 7556>0.10)

reject H 0

. Conclude education

level does not matter in

explaining annual restaurant

meals at the 5% significance

level.

e. Since only age mattered in explaining annual restaurant meals in this fast food restaurant, I would build

and test a simple linear regression model that used age as the only explanatory variable.

  1. The athletic director of State University is interested in developing a multiple regression model that

might be used to explain the variation in attendance at football games at his school. A sample of 16

games is selected from home games played in the last 10 seasons:

y = Game attendance;

x 1

= Team win/loss percentage to date;

x

2

= opponent win/loss percentage to date;

x 3

= games played this season; and

x 4

= temperature at game time.

a. Population model:

     

1 1 2 2 3 3 4 4

y x x x x

o

Estimated sample model:

1 2 3 4

y  14 , 122. 2409  63. 1533 x  10. 0958 x  31. 5062 x  55. 4609 x

b. R

2

= 0.7753. This means that 77.53% of the total variation in the dependent variable is explained by

the combination of the 4 independent variables, and changes in one or more of the independent

variables can explain changes in game attendance in a linear fashion.

c..

Use p-value (=Significance F) and α = 0.05.

Ho: regression is not significant ( or β 1

= β 2

= β 3

= β 4

=0); Ha: regression is significant (or at least one β i

Reject Ho if p-value <α = 0.05.

p-value = 0.0014.

α/2=0.

α = 0.

Since p-value < α (0.0014 < 0.05), reject Ho. This means that the regression is significant and collectively,

changes in team win/loss percentage, opponent win/loss percentage, games played this season, and

temperature at game time explain changes in game attendance..

d..

Use p-values, α=0.08, n=16, k=4, i=1,2, 3, 4

Ho: β i

=1 Ha: β i

Reject Ho if p-value<α=0.

p-value= 0.0014 (i=1) p-value= 0.4953 (i=2) p-value= 0.8621 (i=3) p-value= 0.3909 (i=4)

Since p-value <α

(0.0014<0.08) reject

H

0

. Conclude team

win/loss percentage to

date matters in

explaining game

attendance at the 8%

significance level.

Since p-value>α

(0.4953>0.08) do not

reject H

0

. Conclude

opponent win/loss

percentage to date does

not matter in explaining

game attendance at the

8% significance level.

Since p-value>α

(0.8621>0.08) do not

reject H

0

. Conclude

games played this

season do not matter in

explaining game

attendance at the 8%

significance level.

Since p-value>α

(0.3909>0.08), do not

reject H

0

. Conclude

temperature at game time

does not matter in

explaining game

attendance at the 8%

significance level.

Thus, only x1, the team’s win/loss percentage to date, has a significant impact on y, game attendance.

e. s ε

=Standard Error = sqrt(MSE)=1184.1247. Since y (game attendance) is in the tens of thousands, this

might be a very large value for prediction, as it translates to 1,184,124.7 people (over 1 million).

f. If there is multicollinearity, any one or more of the four can result: When you add to a significant

regression model an x variable that is correlated with existing independent variables, in the new

model compared to the old,

  1. Significant coefficients become insignificant
  2. Values of coefficients change a great deal
  3. Incorrect signs on coefficients
  4. The model standard error becomes larger

To test for multicollinearity, choose the x-variable that you suspect is correlated with the others, and

regress this against all the other independent variables (y is not part of this regression model). Use the R-

square value from the new regression to calculate the Variance Inflation Factor (VIF), VIF = 1/ (1-R

2

). If

VIF is less than 5, multicollinearity is not a problem; if it is 5 or larger, there is multicollinearity.

α/2=0.

Salary versus Experience

0

10

20

30

40

50

60

25 30 35 40 45 50 55

Experience (months)

Salay ($1000s)

From the graphs, it appears experience and salary have the strongest linear relationship, though it is still

relatively weak; average grade and salary have a fairly weak linear relationship; and post-secondary

education and salary have virtually no linear relationship (a horizontal relationship indicates changes in x do

not change y). Remember to label graphs.

b. Generate Excel output to get:

0 1 1 2 2 3 3 1 2 3

0 1 1 2 2 3 3

y ˆ b bx bx bx 3 , 389. 4433 40. 3758 x 324. 0260 x 474. 5868 x

y x x x

Regression Statistics

Multiple R 0.

R Square 0.

Adjusted R Square 0.

Standard Error 5384.

Observations 15

ANOVA df SS MS F Significance F

Regression 3 401,835,496.0583 133,945,165.3528 4.6195 0.

Residual 11 318,948,503.9417 28,995,318.

Total 14 720,784,000.

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

Intercept -3389.4433 13057.1926 -0.2596 0.8000 -32128.1450 25349.

Post-secondary Education

(in years) 40.3758 682.0247 0.0592 0.9539 -1460.7513 1541.

Average Grade (%) 324.0260 170.9849 1.8951 0.0847 -52.3093 700.

Experience (months) 474.5868 197.8779 2.3984 0.0353 39.0603 910.

c. Each 1 year increase in post-secondary education raises salary by $40.38. Each 1% increase in average

grade raises salary by $324.03. Each 1 month increase in work experience raises salary by $474.59.

d.

  1. Use p-value (=Significance F), α=0.05, n=15, k=3.
  2. Ho: regression is not significant, Ha: regression is significant.
  3. Reject Ho if p-value<α ,
  1. p-value=0.
  2. Since p-value<α (0.0252<0.05), reject Ho and conclude the regression is significant.

To use the classical approach, F α,(k, n-k-1)

= F

0.05,(3,11)

= 3.587, and F 0

= MSR/MSE=4.6195.

α=0.

e..

Use p-values, α=0.05, n=15, k=3, i=1,2, 3

Ho: β

i

=1 Ha: β

i

Reject Ho if p-value<α=0.

p-value = 0.9539 p-value= 0.0847 p-value= 0.

Since p-value >α (0.9539>0.05)

do not reject H 0

. Conclude PSE

does not matter in explaining

salary at the 5% significance

level.

Since p-value>α (0.0847>0.05) do

not reject H 0

. Conclude average

grade does not matter in explaining

salary at the 5% significance level.

Since p-value <α (0.0353<0.05)

reject H 0

. Conclude experience

matters in explaining salary at

the 5% significance level.

f. Only β

3

is statistically significant (different than zero). This means only changes in x

3

(experience, in

months) matters in explaining changes in salary.

g..

Salary (k, n-k-1)$)

Post-secondary

Education (k, n-k-1)years)

Average

Grade (k, n-k-1)%)

Experience

(k, n-k-1)months)

Salary ($) 1

Post-secondary Education (years) 0.102822872 1

Average Grade (%) 0.570401335 0.226778804 1

Experience (months) 0.633833527 -0.005384538 0.307764594 1

From this matrix, we see that experience matters most in explaining salary, then average grade, then

post-secondary education, if we believe that the relationships to salary are all linear.

h. I would only need a simple linear regression model with experience as the explanatory variable, since

it is the only significant variable in the multiple regression model. If testing was at the 10% level, then

I would use a multiple linear regression with both experience and average grade as explanatory

variables, since both would likely be statistically significant given my previous results (p-values).

a. In the theory of Demand, we utilize the income effect, impact on increasing preferences, and the

substitution effect. In general, Crest sales should rise as personal disposal income (PDI) rises, due to

the income effect. If advertising increases consumer preferences for a product, we should see Crest

sales rise as the Crest Advertising Budget rises, due to an increase in preference. A positive

relationship between Crest Sales and the Crest/Colgate advertising ratio might signal a substitution

effect. All three relationships should be positive.

b. The relationship between Crest Sales and PDI looks to be a positive, linear and relatively strong

relationship, confirming the possible presence of an income effect.

The relationship between Crest Sales and Crest Advertising Budget also appears to be positive, linear

and relatively strong, until we magnify the range of advertising budget (see third graph), after which

α/2=0.

Crest Sales versus Crest Advertising Budget, 1967-

0

50

100

150

200

250

0.9 1 1.1 1.2 1.3 1.4 1.

Crest/Colgate Advertising Ratio

Crest Sales ($millions)

c. This requires first generating Excel Output, which is shown below. It shows that as Crest Advertising

expenditures rise by $1, Crest Sales rise by $3.8932 (both are measured in 1000s); as the

Crest/Colgate Advertising Ratio rises by 1 unit, Crest Sales rise by $29.6073 million; and as personal

disposable income rises by $1billion, Crest sales rise by $86.5187 thousand. This is consistent with

the theories of increasing preferences, the substitution effect, and the income effect.

1 2 3

y  30625. 9073  3. 8932 x  29607. 3153 x  86. 5187 x

Regression Statistics

Multiple R 0.

R Square 0.

Adjusted R Square 0.

Standard Error 9969.

Observations 13

ANOVA df SS MS F Significance F

Regression 3 20156028160 6718676053 67.59468186 1.70869E-

Residual 9 894568667.4 99396518.

Total 12 21050596827

Coefficients

Standard

Error t Stat P-value Lower 95% Upper 95%

Intercept 30625.90727 19808.00932 1.546137564 0.156473449 -14182.95704 75434.

Crest Budget 3.893187989 2.081204031 1.870642153 0.0942021 -0.814826205 8.

Crest /Colgate Ad Ratio -29607.31531 23822.08725 -1.242851434 0.245326546 -83496.66169 24282.

Disposable Income 86.51869161 18.69329932 4.628326446 0.001239614 44.23147842 128.

d..

  1. Use p-value (=Significance F), α=0.05, n=13, k=3.
  2. Ho: regression is not significant, Ha: regression is significant.
  3. Reject Ho if p-value<α ,
  1. p-value=0.
  2. Since p-value<α (0.0000<0.05), reject Ho and conclude the regression is significant.

To use the classical approach, F α,(k, n-k-1)

= F

0.05,(3,9)

= 3.863, and F 0

= MSR/MSE=67.5947.

α=0.

e..

Use p-values, α=0.08, n=13, k=3, i=1,2, 3

Ho: β

i

=1 Ha: β

i

Reject Ho if p-value<α=0.

p-value = 0.0942 p-value= 0.0847 p-value= 0.

Since p-value >α (0.0942>0.08)

do not reject H 0

. Conclude

amount spend on advertising

Crest does not matter in

explaining Crest Sales at the 8%

significance level.

Since p-value>α (0.2453>0.08) do

not reject H 0

. Conclude the

Crest/Colgate advertising ratio does

not matter in explaining Crest Sales

at the 8% significance level.

Since p-value <α (0.0012<0.08)

reject H 0

. Conclude personal

disposable income matters in

explaining Crest Sales at the 8%

significance level.

These results show that only personal disposable income matters in explaining Crest Sales at the 8%

significance level, so only the income effect is present.

f. Since only personal disposable income mattered in explaining Crest Sales, at the 5% level, I would

draw several conclusions. First, graphs are an imperfect way to establish if relationships are linear or

not, and strong or not … further statistical tests are required to verify the intuition that results.

Second, since personal disposable income goes up over time, and population also grows over time, it

is unclear if the increase in Crest Sales is due to income growth or population growth. I might,

therefore, want to run three simple linear regressions using as explanatory variables in each one,

personal disposable income, personal disposable income per capita, and population. I would then

compare these three models to determine which was best in explaining Crest Sales.

α/2=0.

e. For the critical value approach, critical values are ± t 0.025,

= ±2.3646, and the test statistic is t 0

Step 2 is: H 0

: β = 0; H A

: β ≠ 0. With these critical values and test statistics, we would reject H 0

that the slope

coefficient is zero, and conclude that there is evidence at the 5% significance level to suggest revenues affect

profits. The same result would apply using p-values, where Excel will provide a p-value of 0.0008.

f.

Residuals and Revenue

-1.

-0.

-0.

-0.

-0.

0

1

0.000 5.000 10.000 15.000 20.000 25.000 30.000 35.000 40.000 45.000 50.

Revenue

Residual

There seems to be a pattern of decreasing residuals with increased revenues suggesting that there may be a

failure of the assumptions of constant variance and independence between residuals and the independent

variable, revenues.

g.

If the revenue is $35.2 billion, profits would be predicted to be $2.02404 billion.

ŷ=-0.1830 + 0.0627 x = -0.1830 + 0.0627 (35.2) = 2.

  1. Let x1=ShipCost; x2=PrintAds; x3=Rebate%; and x4=WebAds.

a.

  1. 1218 4. 8308 * 1 2. 8122 * 2 16. 7399 * 3

ˆ Equation 1 : y   xxx

Equation 2 : y ˆ  74. 0280  4. 8587 * x 1  3. 5347 * x 3  0. 8648 * x 4

Equation 3 : ˆ y  74. 2219  19. 0292 * x 3  2. 8211 * x 4

Equation 4 : y ˆ  4. 3061  0. 0819 * x 1  2. 2651 * x 2  16. 6966 * x 3  2. 4978 * x 4

b. All equations are linear in coefficients, and significant overall. In addition:.

c. All equations are linear in coefficients, and significant overall. In addition:

Equation 1) has PrintAds and Rebate% significant at the 5% level (k, n-k-1)ShipCost is not).

Equation 2 has Rebate% and WebAds significant at the 5% level (k, n-k-1)ShipCost is not).

Equation 3 has both Rebate% and WebAds siginificant at the 5% level.

Equation 4 has PrintAds, Rebate% and WebAds significant at the 5% level (k, n-k-1)ShipCost is not).

d. Strictly following the four evaluation criteria, we should conclude that Equation 4 is best with the

smallest standard error, highest adjusted R

2

and R

2

, and third highest F 0

Equation 3 is second best with the second smallest standard error, the second highest adjusted R

2

and third highest R

2

, and the highest F 0

However, since ShipCost is insignificant of the four explanatory variables in Equation 4, a better

model is likely a multiple linear regression model with only PrintAds, Rebate% and WebAds as

explanatory variables.

EQUATION 1 EQUATION 2 EQUATION 3 EQUATION 4

Multiple R 0.667897899 0.698636512 0.698636248 0.

R Square 0.446087603 0.488092975 0.488092607 0.

Adjusted R Square 0.409962882 0.454707735 0.466309314 0.

Standard Error 76.33825264 73.386676 72.6017953 70.

Observations 50 50 50 50

F 0

12.34854096 14.62002263 22.4067408 12.