Simple Linear Regression, Slides of Statistics

FOUNDATIONAL CONCEPTS Model Specification: What is SLR? y = β₀ + β₁x + ε Response vs. predictor variable What does the line represent? (conditional mean) Slope interpretation. Intercept interpretation.

Typology: Slides

2024/2025

Uploaded on 06/26/2026

az-fin
az-fin 🇺🇸

7 documents

1 / 8

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Regression Analysis
Simple Linear Regression & ANOVA
Nicoleta Serban, Ph.D.
Professor
Simple Linear Regression:
Regression Line and Prediction
School of Industrial and Systems Engineering
About This Lesson
Learning Objectives:
Explore the difference between
estimation and prediction
Derive statistical confidence and
prediction intervals for the
regression mean response
Apply statistical inference on the
mean response to a data example
using R
pf3
pf4
pf5
pf8

Partial preview of the text

Download Simple Linear Regression and more Slides Statistics in PDF only on Docsity!

Regression Analysis

Simple Linear Regression & ANOVA

Nicoleta Serban, Ph.D.

Professor

Simple Linear Regression:

Regression Line and Prediction

School of Industrial and Systems Engineering

About This Lesson

Learning Objectives:

  • Explore the difference between

estimation and prediction

  • Derive statistical confidence and

prediction intervals for the

regression mean response

  • Apply statistical inference on the

mean response to a data example

using R

Estimation vs. Prediction

Interpretation of estimated mean response:

Ø If x* is one of the observations for the predicting variable, then we use

estimation. Estimated regression line for the value x* is interpreted

as the average estimated mean response for all settings under which

the predicting variable is equal to x*.

Ø If x* is a new observation of the predicting variables, the we use

prediction. Predicted regression line for the value x* is interpreted as

the estimated mean response for one setting under which the

predicting variable is equal to x*.

Estimating the Regression Line

At some selected value of x (say x*), we estimate the “mean response” of y

(or the regression line) via

% $

Because the estimators of! !

and! "

are normally distributed, so is

#. That

means we can draw inference using

if we know its expected value and

variance.

One of the primary motivations for regression is to use the

regression equation to predict future responses. The prediction is

the same as the estimator for the “mean response”, which is.

  1. Due to the new (n+1)

th observation

  1. Due to parameter estimates (of! !

and! "

But the prediction contains two sources of uncertainty:

Predicting a New Response

  1. Variation of the estimated regression line:
  2. Variation of a new measurement: ,

!!

!

!

(

The new observation is independent of the regression

data, so the total variation in predicting y | x* is

!! !!

!

!!

!

!

( ( (

Predicting a New Response

A 100( 1 - α)% prediction interval for a future y* (at x*) is

$ #

" =! +!! is the same as the line estimate, but the

interval is wider than the confidence

interval for the mean response.

Predicting a New Response

Confidence Bands: Estimation vs Prediction

Regression Line

Confidence Interval

Prediction Interval

Observations

  • In both cases we get the narrowest intervals at

! !

= !̅ or close to that. Both intervals get wider

as we go away from! ̅

  • Interval lengths are symmetric around! ̅.
  • Intervals in both cases are longer when $

"

is

greater

  • Both intervals get narrower as n increase

A company, which sells medical supplies to hospitals, clinics, and

doctor's offices, had considered the effectiveness of a new advertising

program. Management wants to know if the advertisement is related to

sales.

This company intends to increase the sales with an effective advertising

program.

What inferences can be made on the prediction of the sales given a

targeted advertisement expenditure?

Linear Regression: Example in R

a. What sales would you predict for an advertisement expenditure of $30,000?

b. What is the variance estimate of the estimated predicted sales for an

advertisement expenditure of $30,000?

c. What are the lower and upper limits of predicted sales for an advertisement

expenditure of $30,000 at 99% confidence level? How will the limits change if

we lower the confidence level to 95%?

d. Compare the confidence intervals of the estimated regression line versus the

predicted regression line. Interpret.

Example in R: Estimating Regression

Line & Prediction

summary(model)

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) - 157.3301 145.1912 - 1.084 0.

adv 2.7721 0.2794 9.921 8.87e- 10

Residual standard error: 101.4 on 23 degrees of freedom

xbar = mean(ADV)

n = 23+

mse =101.4^

var.beta1 = 0.2794^

sxx = mse/var.beta

pred.var = mse*(1+1/n+(xbar-300)^2/sxx)

pred.var

[1] 14286.

a. For advertising expenditure of

$30,000, the predicted sales is:

  • 157.33 + 300 × 2.

= 673.67 thousand

b. The variance of the predicted

sales is

( '

1 +

1

(-

∗ − - ̄)

1 $$

= 14286. 16

Example in R

new = data.frame(adv = 300)

predict.lm(model, new, interval = "predict", level = 0.99)

fit lwr upr

1 674.3047 338.712 1009.

predict.lm(model, new, interval = "predict", level = 0.95)

fit lwr upr

1 674.3047 427.0146 921.

predict.lm(model, new, interval = "confidence", level = 0.99)

fit lwr upr

1 674.3047 496.6497 851.

predict.lm(model, new, interval = "confidence", level = 0.95)

fit lwr upr

1 674.3047 543.395 805.

c. A 99% prediction interval

at an advertisement

expenditure of $30,000 is

(338.712, 1009.897). A

95% interval is (427.014,

921.594).

d. A 99% confidence interval

at an advertisement

expenditure of $30,000 is

(496.649, 851.959). A 95%

interval is (543.395,

805.214).

The confidence intervals are narrower than the prediction intervals because the

prediction intervals have additional variance from the variation of a new measurement.

Example in R (cont’d)