






Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
FOUNDATIONAL CONCEPTS Model Specification: What is SLR? y = β₀ + β₁x + ε Response vs. predictor variable What does the line represent? (conditional mean) Slope interpretation. Intercept interpretation.
Typology: Slides
1 / 11
This page cannot be seen from the preview
Don't miss anything!







Professor
School of Industrial and Systems Engineering
Learning Objectives:
regression model
for simple linear regression
data example using R
Our goal is to find the “best” line that describes a linear relationship; that is, find
0
1
) where
0
1
x + ε
Equivalently, estimating:
0
Intercept
1
Slope
ε is the (random) deviance of the data from the linear model
Which line to choose?
Ø The line that fits the data “best”
where ‘best” is in reference to a
given criterion.
What do we mean by the “best” line?
Ø The regression line is defined by
0
1
), which need to be
estimated_._
Ø The line that makes the errors as
small as possible given a criterion.
What criterion is commonly used?
To estimate (β $
, β %
) , we find values
that minimize sum of squared
errors:
To estimate (β 0
, β 1
), we find values that minimize squared error:
!"#
$
!"#
$
%
used but other criteria can be used also. The estimated regression line
would be different, depending on the criterion used.
see next pages
min
β
0
,β
1
)
i= 1
n
y i
− β !
x i
2
∂
∂β !
)
i= 1
n
y i
− β !
+β "
x i
2
= 0
∂
∂β "
)
i= 1
n
y i
− β !
+β "
x i
2
= 0
Ø Result into a system of linear equation
in β !
and β "
Ø Solve using linear algebra
Ø Solutions to the system are
1 2 0
and
1 2 1
1
xy
xx
∑ #$%
& y
(x
( x)
∑ #$%
& (x
( x)
'
0
= y 5 −
1
x 5
To solve, take the first order derivatives of
the function to be minimized and equate to
0:
Begin with the minimization problem:
Estimating σ
2 Sample variance
2
(chi-squared distribution with n-2 degrees of freedom)
What is the sample variance estimation?
Basic statistic concept:
Consider Z 1
,…,Z n
~ N(μ, σ
2 ) with μ and 5
unknown
The sample variance estimator :
S
2
=
∑ (^) Z i
−
9 Z
2
n−
n−1 S
2
σ
2
~ χ
n−
2
Why n-1?
We lose a degree of freedom because we replace μ ←
9 Z
Now, going back to
This looks like the sample variance estimates except we use n-2 degrees of freedom.
Why?
>σ
∑ ϵ? i
2
n−
~ χ
n−
2
Recall that
ϵ i
= y i
− β 0
+β 1
x i
Replaced by >ϵ i
= y i
0
1
x i
We lose two degrees
of freedom because
β 0
0
β 1
1
Thus, assuming that ϵ i
~ N 0, σ
(This is called the sampling distribution of P,
)
Pσ
= MSE ~ χ
n−
Commonly interested in the behavior of β 1
Ø A positive value of β %
is consistent with a direct relationship between x and y;
e.g. , higher values of height are associated with higher values of weight, or
lower values of revenue are associated with lower values of profit;
Ø A negative value of β %
is consistent with an inverse relationship between x
and y ; e.g. , higher price of a product is associated with lower demand, or a
lower inflation rate is associated with a higher savings rate;
Ø A close-to-zero value of β %
means that there is not a significant association
between x and y.
A.Fit a linear regression. What are the estimated regression coefficients and
the estimated regression line?
B.Interpret the coefficients.
C.What does the model predict sales as the advertising expenditure increases
for an additional $1,000?
D.What sales would you predict for an advertising expenditure of $30,000?
E.What is the estimate of the error variance?
F. What could you say about the sales for an advertising expenditure of
## Read Data in R
data = read.table(“meddcor.txt", sep="", header = FALSE)
## Response & Predicting Variable
sales = data[,1]
adv = data[,2]
## Fit a linear regression model
model = lm(sales ~ adv)
summary(model)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) - 157.3301 145.1912 - 1.084 0.
adv 2.7721 0.2794 9.921 8.87e-10 ***
Residual standard error: 101.4 on 23 degrees of freedom
Multiple R-squared: 0.8106, Adjusted R-squared: 0.
F-statistic: 98.43 on 1 and 23 DF, p-value: 8.873e- 10
Estimated Model Parameters:
/
.
A.Fit a linear regression. What are the estimated regression coefficients and
the estimated regression line?
Solution: Estimates (% 0
1
) are (-157.33, 2.77) and the regression
equation is:
Sales = - 157.33 + 2.77 Adv Expenditure
B.Interpret the coefficients.
Solution: The sales increase by $2770 with each $100 additional
expenditure in advertisement. Or the sales increase with $27.7 with each
dollar invested in advertisement expenditure.
C.What does the model predict as the advertising expenditure increases for
an additional $1,000?
Solution: The increase in sales is 10×2.77 = 27.7 thousands.
D. What sales would you predict for an
advertisement expenditure of $30,000?
Solution: The predicted sales is
E. What is the estimate of the error variance?
Solution: Estimate σ
2 with MSE = 10,281.
F. What could you say about the sales for an
advertising expenditure of $100,000?
Solution: An advertisement expenditure of
$100,000 or 1000 units is outside of the observed
range and thus we cannot predict the sales since
this is extrapolation.