








Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Solutions to various regression analysis problems, including testing the significance of regression models, interpreting the results of t-tests for individual coefficients, and examining the impact of multicollinearity. The examples cover different regression models and datasets, including game attendance, salary versus experience, and crest sales versus income and advertising budget.
Typology: Study notes
1 / 14
This page cannot be seen from the preview
Don't miss anything!









a. 1 2 3
ˆ y 39. 5835 2. 2109 x 1. 1152 x 0. 7360 x
b.
=MSR/MSE= (SSR/k)/(SSE/(n-k-1), α=0.05, n=20, k=3.
α,
α, (k, n-k-1)k, n-k-1))
0.05, (k, n-k-1)3,1)6)
0
α
(146.541> 18.513), reject Ho and conclude the regression is significant.
c..
Use
bi
i
0
, α=0.05, n=20, k=3, i=1,2, 3
Ho: β
i
=1 Ha: β
i
Reject Ho if t 0
> t α,/2, n-k-1,
= t 0.025,
=2.1199 or t 0
<- t α/2,
-t α/
=-2.1199 t α/
1
1
0
b
3
2
0
b
1
1
0
b
Since t 0
> t α,/
reject H 0
. Conclude test score
on vitality and drive matters at
the 5% significance level.
Since t 0
> t α,/
reject H 0
. Conclude test score
on numeric and verbal
reasoning ability matters at the
5% significance level.
Since t 0
> t α,/
reject H 0
. Conclude test score
on sociability and leadership
matters at the 5% significance
level.
d. Since all test scores matter and are positive, we rate the importance based on the magnitude of their
coefficients (which explain increases in salary): we rank vitality and drive first, numeric and verbal
reasoning ability second, and sociability and leadership third.
e. If the regression was not significant, we could not conclude that the independent variables explained
salaries, so it would be likely that none of the qualities measured by the tests led to higher salaries.
α=0.
α/2=0.
a. (1) ε~iidN(0, σiidN(0, σ
2
); (2) regression is linear in the coefficients; (3) ε is independent of each of the
explanatory variables; and (4) explanatory variables are independent of each other. Test assumption
(2) through visual inspection of population model; or test (4) through VIF test of multicollinearity.
b. y = β 0
+ β 1
x 1
+ β 2
x 2
+ β 3
x 3
+ ε ; ŷ = -269.8384 +4.9528* x 1
0.8340* x 2
0.0887* x 3
For every unit increase in x1 (ads placed the previous week), calls received this week rise by 4.9528.
SUMMARY OUTPUT
Regression Statistics
R Square 0.
Adjusted R Square 0.
Standard Error 86.
Observations 12
ANOVA df SS MS F
Significance
F
Regression 3 85904.2786 28634.7595 3.8550 0.
Residual 8 59423.9714 7427.
Total 11 145328.
Coefficient
s Stand. Error t Stat P-value Lower 95%
Intercept -269.8384 216.5686 -1.2460 0.2480 -769.
x1 4.9528 20.6990 0.2393 0.8169 -42.
x2 0.8340 0.5239 1.5921 0.1500 -0.
x3 0.0887 0.0548 1.6170 0.1445 -0.
c..
= β 2
=β 3
=0, Ha: Regression Model is significant (or not all slope coefficient are zero)
d..
Use t= β i
/s bi
, i=1, 2, 3 and α=0.
o
: β 1
a
: β 1
o
: β 2
a
: β 2
o
: β 3
a
: β 3
Reject Ho if t 0
> t α,/2, n-k-1,
= t 0.025,
=2.3060or t 0
<- t α/2,
-t α/
=-2.3060 t α/
t 0
= β 1
/s b
= 0.2393 t o
= β 2
/s b
= 1.5921 t o
= β 3
/s b
Don’t reject H 0
as -t α/
d..
Use p-values, α=0.10, n=10, k=3, i=1,2, 3
Ho: β
i
=1 Ha: β
i
Reject Ho if p-value<α=0.
p-value= 0.0179 p-value= 0.3836 p-value= 0.
Since p-value <α (0.0179<0.10)
reject H 0
. Conclude age
matters in explaining annual
restaurant meals at the 5%
significance level.
Since p-value>α (0.3836<0.10)
reject H 0
. Conclude income
does not matter in explaining
annual restaurant meals at the
5% significance level.
Since p-value>α (0. 7556>0.10)
reject H 0
. Conclude education
level does not matter in
explaining annual restaurant
meals at the 5% significance
level.
e. Since only age mattered in explaining annual restaurant meals in this fast food restaurant, I would build
and test a simple linear regression model that used age as the only explanatory variable.
might be used to explain the variation in attendance at football games at his school. A sample of 16
games is selected from home games played in the last 10 seasons:
y = Game attendance;
x 1
= Team win/loss percentage to date;
x
2
= opponent win/loss percentage to date;
x 3
= games played this season; and
x 4
= temperature at game time.
a. Population model:
1 1 2 2 3 3 4 4
y x x x x
o
Estimated sample model:
1 2 3 4
y 14 , 122. 2409 63. 1533 x 10. 0958 x 31. 5062 x 55. 4609 x
b. R
2
= 0.7753. This means that 77.53% of the total variation in the dependent variable is explained by
the combination of the 4 independent variables, and changes in one or more of the independent
variables can explain changes in game attendance in a linear fashion.
c..
Use p-value (=Significance F) and α = 0.05.
Ho: regression is not significant ( or β 1
= β 2
= β 3
= β 4
=0); Ha: regression is significant (or at least one β i
Reject Ho if p-value <α = 0.05.
p-value = 0.0014.
α/2=0.
α = 0.
Since p-value < α (0.0014 < 0.05), reject Ho. This means that the regression is significant and collectively,
changes in team win/loss percentage, opponent win/loss percentage, games played this season, and
temperature at game time explain changes in game attendance..
d..
Use p-values, α=0.08, n=16, k=4, i=1,2, 3, 4
Ho: β i
=1 Ha: β i
Reject Ho if p-value<α=0.
p-value= 0.0014 (i=1) p-value= 0.4953 (i=2) p-value= 0.8621 (i=3) p-value= 0.3909 (i=4)
Since p-value <α
(0.0014<0.08) reject
0
. Conclude team
win/loss percentage to
date matters in
explaining game
attendance at the 8%
significance level.
Since p-value>α
(0.4953>0.08) do not
reject H
0
. Conclude
opponent win/loss
percentage to date does
not matter in explaining
game attendance at the
8% significance level.
Since p-value>α
(0.8621>0.08) do not
reject H
0
. Conclude
games played this
season do not matter in
explaining game
attendance at the 8%
significance level.
Since p-value>α
(0.3909>0.08), do not
reject H
0
. Conclude
temperature at game time
does not matter in
explaining game
attendance at the 8%
significance level.
Thus, only x1, the team’s win/loss percentage to date, has a significant impact on y, game attendance.
e. s ε
=Standard Error = sqrt(MSE)=1184.1247. Since y (game attendance) is in the tens of thousands, this
might be a very large value for prediction, as it translates to 1,184,124.7 people (over 1 million).
f. If there is multicollinearity, any one or more of the four can result: When you add to a significant
regression model an x variable that is correlated with existing independent variables, in the new
model compared to the old,
To test for multicollinearity, choose the x-variable that you suspect is correlated with the others, and
regress this against all the other independent variables (y is not part of this regression model). Use the R-
square value from the new regression to calculate the Variance Inflation Factor (VIF), VIF = 1/ (1-R
2
). If
VIF is less than 5, multicollinearity is not a problem; if it is 5 or larger, there is multicollinearity.
α/2=0.
Salary versus Experience
0
10
20
30
40
50
60
25 30 35 40 45 50 55
Experience (months)
Salay ($1000s)
From the graphs, it appears experience and salary have the strongest linear relationship, though it is still
relatively weak; average grade and salary have a fairly weak linear relationship; and post-secondary
education and salary have virtually no linear relationship (a horizontal relationship indicates changes in x do
not change y). Remember to label graphs.
b. Generate Excel output to get:
0 1 1 2 2 3 3 1 2 3
0 1 1 2 2 3 3
y ˆ b bx bx bx 3 , 389. 4433 40. 3758 x 324. 0260 x 474. 5868 x
y x x x
Regression Statistics
Multiple R 0.
R Square 0.
Adjusted R Square 0.
Standard Error 5384.
Observations 15
ANOVA df SS MS F Significance F
Regression 3 401,835,496.0583 133,945,165.3528 4.6195 0.
Residual 11 318,948,503.9417 28,995,318.
Total 14 720,784,000.
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept -3389.4433 13057.1926 -0.2596 0.8000 -32128.1450 25349.
Post-secondary Education
(in years) 40.3758 682.0247 0.0592 0.9539 -1460.7513 1541.
Average Grade (%) 324.0260 170.9849 1.8951 0.0847 -52.3093 700.
Experience (months) 474.5868 197.8779 2.3984 0.0353 39.0603 910.
c. Each 1 year increase in post-secondary education raises salary by $40.38. Each 1% increase in average
grade raises salary by $324.03. Each 1 month increase in work experience raises salary by $474.59.
d.
To use the classical approach, F α,(k, n-k-1)
0.05,(3,11)
= 3.587, and F 0
α=0.
e..
Use p-values, α=0.05, n=15, k=3, i=1,2, 3
Ho: β
i
=1 Ha: β
i
Reject Ho if p-value<α=0.
p-value = 0.9539 p-value= 0.0847 p-value= 0.
Since p-value >α (0.9539>0.05)
do not reject H 0
. Conclude PSE
does not matter in explaining
salary at the 5% significance
level.
Since p-value>α (0.0847>0.05) do
not reject H 0
. Conclude average
grade does not matter in explaining
salary at the 5% significance level.
Since p-value <α (0.0353<0.05)
reject H 0
. Conclude experience
matters in explaining salary at
the 5% significance level.
f. Only β
3
is statistically significant (different than zero). This means only changes in x
3
(experience, in
months) matters in explaining changes in salary.
g..
Salary (k, n-k-1)$)
Post-secondary
Education (k, n-k-1)years)
Average
Grade (k, n-k-1)%)
Experience
(k, n-k-1)months)
Salary ($) 1
Post-secondary Education (years) 0.102822872 1
Average Grade (%) 0.570401335 0.226778804 1
Experience (months) 0.633833527 -0.005384538 0.307764594 1
From this matrix, we see that experience matters most in explaining salary, then average grade, then
post-secondary education, if we believe that the relationships to salary are all linear.
h. I would only need a simple linear regression model with experience as the explanatory variable, since
it is the only significant variable in the multiple regression model. If testing was at the 10% level, then
I would use a multiple linear regression with both experience and average grade as explanatory
variables, since both would likely be statistically significant given my previous results (p-values).
a. In the theory of Demand, we utilize the income effect, impact on increasing preferences, and the
substitution effect. In general, Crest sales should rise as personal disposal income (PDI) rises, due to
the income effect. If advertising increases consumer preferences for a product, we should see Crest
sales rise as the Crest Advertising Budget rises, due to an increase in preference. A positive
relationship between Crest Sales and the Crest/Colgate advertising ratio might signal a substitution
effect. All three relationships should be positive.
b. The relationship between Crest Sales and PDI looks to be a positive, linear and relatively strong
relationship, confirming the possible presence of an income effect.
The relationship between Crest Sales and Crest Advertising Budget also appears to be positive, linear
and relatively strong, until we magnify the range of advertising budget (see third graph), after which
α/2=0.
Crest Sales versus Crest Advertising Budget, 1967-
0
50
100
150
200
250
0.9 1 1.1 1.2 1.3 1.4 1.
Crest/Colgate Advertising Ratio
Crest Sales ($millions)
c. This requires first generating Excel Output, which is shown below. It shows that as Crest Advertising
expenditures rise by $1, Crest Sales rise by $3.8932 (both are measured in 1000s); as the
Crest/Colgate Advertising Ratio rises by 1 unit, Crest Sales rise by $29.6073 million; and as personal
disposable income rises by $1billion, Crest sales rise by $86.5187 thousand. This is consistent with
the theories of increasing preferences, the substitution effect, and the income effect.
1 2 3
y 30625. 9073 3. 8932 x 29607. 3153 x 86. 5187 x
Regression Statistics
Multiple R 0.
R Square 0.
Adjusted R Square 0.
Standard Error 9969.
Observations 13
ANOVA df SS MS F Significance F
Regression 3 20156028160 6718676053 67.59468186 1.70869E-
Residual 9 894568667.4 99396518.
Total 12 21050596827
Coefficients
Standard
Error t Stat P-value Lower 95% Upper 95%
Intercept 30625.90727 19808.00932 1.546137564 0.156473449 -14182.95704 75434.
Crest Budget 3.893187989 2.081204031 1.870642153 0.0942021 -0.814826205 8.
Crest /Colgate Ad Ratio -29607.31531 23822.08725 -1.242851434 0.245326546 -83496.66169 24282.
Disposable Income 86.51869161 18.69329932 4.628326446 0.001239614 44.23147842 128.
d..
To use the classical approach, F α,(k, n-k-1)
0.05,(3,9)
= 3.863, and F 0
α=0.
e..
Use p-values, α=0.08, n=13, k=3, i=1,2, 3
Ho: β
i
=1 Ha: β
i
Reject Ho if p-value<α=0.
p-value = 0.0942 p-value= 0.0847 p-value= 0.
Since p-value >α (0.0942>0.08)
do not reject H 0
. Conclude
amount spend on advertising
Crest does not matter in
explaining Crest Sales at the 8%
significance level.
Since p-value>α (0.2453>0.08) do
not reject H 0
. Conclude the
Crest/Colgate advertising ratio does
not matter in explaining Crest Sales
at the 8% significance level.
Since p-value <α (0.0012<0.08)
reject H 0
. Conclude personal
disposable income matters in
explaining Crest Sales at the 8%
significance level.
These results show that only personal disposable income matters in explaining Crest Sales at the 8%
significance level, so only the income effect is present.
f. Since only personal disposable income mattered in explaining Crest Sales, at the 5% level, I would
draw several conclusions. First, graphs are an imperfect way to establish if relationships are linear or
not, and strong or not … further statistical tests are required to verify the intuition that results.
Second, since personal disposable income goes up over time, and population also grows over time, it
is unclear if the increase in Crest Sales is due to income growth or population growth. I might,
therefore, want to run three simple linear regressions using as explanatory variables in each one,
personal disposable income, personal disposable income per capita, and population. I would then
compare these three models to determine which was best in explaining Crest Sales.
α/2=0.
e. For the critical value approach, critical values are ± t 0.025,
= ±2.3646, and the test statistic is t 0
Step 2 is: H 0
: β = 0; H A
: β ≠ 0. With these critical values and test statistics, we would reject H 0
that the slope
coefficient is zero, and conclude that there is evidence at the 5% significance level to suggest revenues affect
profits. The same result would apply using p-values, where Excel will provide a p-value of 0.0008.
f.
Residuals and Revenue
-1.
-0.
-0.
-0.
-0.
0
1
0.000 5.000 10.000 15.000 20.000 25.000 30.000 35.000 40.000 45.000 50.
Revenue
Residual
There seems to be a pattern of decreasing residuals with increased revenues suggesting that there may be a
failure of the assumptions of constant variance and independence between residuals and the independent
variable, revenues.
g.
If the revenue is $35.2 billion, profits would be predicted to be $2.02404 billion.
ŷ=-0.1830 + 0.0627 x = -0.1830 + 0.0627 (35.2) = 2.
a.
ˆ Equation 1 : y x x x
Equation 2 : y ˆ 74. 0280 4. 8587 * x 1 3. 5347 * x 3 0. 8648 * x 4
Equation 3 : ˆ y 74. 2219 19. 0292 * x 3 2. 8211 * x 4
Equation 4 : y ˆ 4. 3061 0. 0819 * x 1 2. 2651 * x 2 16. 6966 * x 3 2. 4978 * x 4
b. All equations are linear in coefficients, and significant overall. In addition:.
c. All equations are linear in coefficients, and significant overall. In addition:
Equation 1) has PrintAds and Rebate% significant at the 5% level (k, n-k-1)ShipCost is not).
Equation 2 has Rebate% and WebAds significant at the 5% level (k, n-k-1)ShipCost is not).
Equation 3 has both Rebate% and WebAds siginificant at the 5% level.
Equation 4 has PrintAds, Rebate% and WebAds significant at the 5% level (k, n-k-1)ShipCost is not).
d. Strictly following the four evaluation criteria, we should conclude that Equation 4 is best with the
smallest standard error, highest adjusted R
2
and R
2
, and third highest F 0
Equation 3 is second best with the second smallest standard error, the second highest adjusted R
2
and third highest R
2
, and the highest F 0
However, since ShipCost is insignificant of the four explanatory variables in Equation 4, a better
model is likely a multiple linear regression model with only PrintAds, Rebate% and WebAds as
explanatory variables.
EQUATION 1 EQUATION 2 EQUATION 3 EQUATION 4
Multiple R 0.667897899 0.698636512 0.698636248 0.
R Square 0.446087603 0.488092975 0.488092607 0.
Adjusted R Square 0.409962882 0.454707735 0.466309314 0.
Standard Error 76.33825264 73.386676 72.6017953 70.
Observations 50 50 50 50
F 0
12.34854096 14.62002263 22.4067408 12.