Simple Linear Regression, Slides of Statistics

FOUNDATIONAL CONCEPTS Model Specification: What is SLR? y = β₀ + β₁x + ε Response vs. predictor variable What does the line represent? (conditional mean) Slope interpretation. Intercept interpretation.

Typology: Slides

2024/2025

Uploaded on 06/26/2026

az-fin
az-fin 🇺🇸

7 documents

1 / 11

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Regression Analysis
Simple Linear Regression & ANOVA
Nicoleta Serban, Ph.D.
Professor
Simple Linear Regression: Estimation
School of Industrial and Systems Engineering
About This Lesson
Learning Objectives:
Review the simple linear
regression model
Examine the estimation approach
for simple linear regression
Apply the estimation approach to a
data example using R
pf3
pf4
pf5
pf8
pf9
pfa

Partial preview of the text

Download Simple Linear Regression and more Slides Statistics in PDF only on Docsity!

Regression Analysis

Simple Linear Regression & ANOVA

Nicoleta Serban, Ph.D.

Professor

Simple Linear Regression: Estimation

School of Industrial and Systems Engineering

About This Lesson

Learning Objectives:

  • Review the simple linear

regression model

  • Examine the estimation approach

for simple linear regression

  • Apply the estimation approach to a

data example using R

Simple Linear Regression

Our goal is to find the “best” line that describes a linear relationship; that is, find

0

1

) where

Y = %

0

1

x + ε

Equivalently, estimating:

0

Intercept

1

Slope

ε is the (random) deviance of the data from the linear model

Simple Linear Regression: Which line?

Which line to choose?

Ø The line that fits the data “best”

where ‘best” is in reference to a

given criterion.

What do we mean by the “best” line?

Ø The regression line is defined by

0

1

), which need to be

estimated_._

Ø The line that makes the errors as

small as possible given a criterion.

What criterion is commonly used?

Estimation Approach: “Best” Line

To estimate (β $

, β %

) , we find values

that minimize sum of squared

errors:

Model Estimation: Approach

i=

n

y

i

x

i

To estimate (β 0

, β 1

), we find values that minimize squared error:

= .y −

x.

S

xy

S

xx

!"#

$

y

i

(xi−

x)

!"#

$

(xi−

x)

%

i=

n

y

i

x

i

Model Estimation: Approach

à In standard regression models, the least squared errors criterion is

used but other criteria can be used also. The estimated regression line

would be different, depending on the criterion used.

see next pages

min

β

0

1

)

i= 1

n

y i

− β !

  • β "

x i

2

∂β !

)

i= 1

n

y i

− β !

+β "

x i

2

= 0

∂β "

)

i= 1

n

y i

− β !

+β "

x i

2

= 0

Ø Result into a system of linear equation

in β !

and β "

Ø Solve using linear algebra

Ø Solutions to the system are

1 2 0

and

1 2 1

1

S

xy

S

xx

∑ #$%

& y

(x

( x)

∑ #$%

& (x

( x)

'

0

= y 5 −

1

x 5

Model Estimation: Approach (cont’d)

To solve, take the first order derivatives of

the function to be minimized and equate to

0:

Begin with the minimization problem:

Estimating σ

2 Sample variance

Assuming "̂

E

E

~ N(0, σ

2

(chi-squared distribution with n-2 degrees of freedom)

F

E

F

GHF

F

Variance Sampling Distribution

What is the sample variance estimation?

Basic statistic concept:

Consider Z 1

,…,Z n

~ N(μ, σ

2 ) with μ and 5

unknown

The sample variance estimator :

S

2

=

∑ (^) Z i

9 Z

2

n−

n−1 S

2

σ

2

~ χ

n−

2

Why n-1?

We lose a degree of freedom because we replace μ ←

9 Z

Now, going back to

This looks like the sample variance estimates except we use n-2 degrees of freedom.

Why?

2

∑ ϵ? i

2

n−

~ χ

n−

2

Variance Sampling Distribution (cont’d)

Recall that

ϵ i

= y i

− β 0

+β 1

x i

Replaced by >ϵ i

= y i

0

1

x i

We lose two degrees

of freedom because

β 0

0

β 1

1

Thus, assuming that ϵ i

~ N 0, σ

(This is called the sampling distribution of P,

)

= MSE ~ χ

n−

Variance Sampling Distribution (cont’d)

Model Parameter Interpretation

Commonly interested in the behavior of β 1

Ø A positive value of β %

is consistent with a direct relationship between x and y;

e.g. , higher values of height are associated with higher values of weight, or

lower values of revenue are associated with lower values of profit;

Ø A negative value of β %

is consistent with an inverse relationship between x

and y ; e.g. , higher price of a product is associated with lower demand, or a

lower inflation rate is associated with a higher savings rate;

Ø A close-to-zero value of β %

means that there is not a significant association

between x and y.

Example in R: Estimation

A.Fit a linear regression. What are the estimated regression coefficients and

the estimated regression line?

B.Interpret the coefficients.

C.What does the model predict sales as the advertising expenditure increases

for an additional $1,000?

D.What sales would you predict for an advertising expenditure of $30,000?

E.What is the estimate of the error variance?

F. What could you say about the sales for an advertising expenditure of

Example in R (cont’d)

## Read Data in R

data = read.table(“meddcor.txt", sep="", header = FALSE)

## Response & Predicting Variable

sales = data[,1]

adv = data[,2]

## Fit a linear regression model

model = lm(sales ~ adv)

summary(model)

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) - 157.3301 145.1912 - 1.084 0.

adv 2.7721 0.2794 9.921 8.87e-10 ***

Residual standard error: 101.4 on 23 degrees of freedom

Multiple R-squared: 0.8106, Adjusted R-squared: 0.

F-statistic: 98.43 on 1 and 23 DF, p-value: 8.873e- 10

Estimated Model Parameters:

S

/

S

.

, P ^2 = 101.4^

A.Fit a linear regression. What are the estimated regression coefficients and

the estimated regression line?

Solution: Estimates (% 0

1

) are (-157.33, 2.77) and the regression

equation is:

Sales = - 157.33 + 2.77 Adv Expenditure

B.Interpret the coefficients.

Solution: The sales increase by $2770 with each $100 additional

expenditure in advertisement. Or the sales increase with $27.7 with each

dollar invested in advertisement expenditure.

C.What does the model predict as the advertising expenditure increases for

an additional $1,000?

Solution: The increase in sales is 10×2.77 = 27.7 thousands.

Example in R (cont’d)

D. What sales would you predict for an

advertisement expenditure of $30,000?

Solution: The predicted sales is

  • 157.33 + 300 × 2.77 = 673.67 thousands

E. What is the estimate of the error variance?

Solution: Estimate σ

2 with MSE = 10,281.

F. What could you say about the sales for an

advertising expenditure of $100,000?

Solution: An advertisement expenditure of

$100,000 or 1000 units is outside of the observed

range and thus we cannot predict the sales since

this is extrapolation.

Example in R (cont’d)