Simple Linear Regression, Slides of Statistics

FOUNDATIONAL CONCEPTS Model Specification: What is SLR? y = β₀ + β₁x + ε Response vs. predictor variable What does the line represent? (conditional mean) Slope interpretation. Intercept interpretation.

Typology: Slides

2024/2025

Uploaded on 06/26/2026

az-fin
az-fin 🇺🇸

7 documents

1 / 9

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Regression Analysis
Simple Linear Regression & ANOVA
Nicoleta Serban, Ph.D.
Professor
Simple Linear Regression: Assumptions,
Diagnostics and Model Performance
School of Industrial and Systems Engineering
About This Lesson
Learning Objectives:
Examine diagnostics to evaluate
the model assumptions and about
identifications of outliers
Differentiate between goodness-of-
fit and linear model performance
pf3
pf4
pf5
pf8
pf9

Partial preview of the text

Download Simple Linear Regression and more Slides Statistics in PDF only on Docsity!

Regression Analysis

Simple Linear Regression & ANOVA

Nicoleta Serban, Ph.D.

Professor

Simple Linear Regression: Assumptions,

Diagnostics and Model Performance

School of Industrial and Systems Engineering

About This Lesson

Learning Objectives:

  • Examine diagnostics to evaluate

the model assumptions and about

identifications of outliers

  • Differentiate between goodness-of-

fit and linear model performance

Simple Linear Regression: Model

Data : {(x 1

,y 1

),…,(x n

,y n

)}

Model : Y

i

0

1

x

i

i

i =1,…,n

Assumptions :

  • Linearity/Mean Zero Assumption: E(# i
  • Constant Variance Assumption : Var(# i

2

  • Independence Assumption {# 1

n

} are independent random variables

  • ( Later we assume # i

~ Normal)

Residual Analysis

Residual Values:

Graphical display: Plot of the residuals 1 i

If the scatter of # i

is not random around zero line , it could be that

Ø The relationship between X and Y is not linear

Ø Variances of error terms are not equal

Ø Response data are not independent

à Goodness-of-fit (GOF) : Use diagnostics to evaluate assumptions.

i

$

$

%

&

$

  • Using residual analysis, we check for

uncorrelated errors but not

independence.

  • Independence is a more complicated

matter. If the data are from a

randomized trial, then independence

is established, but most data are from

observational studies.

Independence Assumption:

There are clusters of residuals: the independence assumption does not hold.

Checking Assumptions: Residual Analysis

x-axis:

Checking the Assumption of Normality

One way to check this assumption in a regression is using a

Normal Probability Plot

$

= rank of ; $

(between 1, n)

F = CDF of Normal Distribution

Ø Let the R statistical software do this for you!

Ø A straight line in normal probability plot

implies assumption of normality is valid

Ø Curvature (especially at the ends) shows

non-normality

Φ

!"

%

− 3 / 8

    • 1 / 4

y-axis: ; $

Assumption of Normality: Examples

Checking the Assumption of Normality

A complementary approach to check for the

normality assumption is by plotting the

histogram of the residuals

Normality Assumption:

The residuals should have an approximately

symmetric distribution, unimodal, and with

no gaps in the data.

Outliers in Regression

A data point far from the majority of the data (in y and/or x ) may be called an

outlier , especially if it does not follow the general trend of the rest of the data.

Ø Data points that are far from the mean of the x’s are called leverage points.

Ø A data point that is far from the mean of either or both the x’s and/or the y’s

are influential points if they influence the fit of the regression.

Ø An outlier may or may not impact the regression fit significantly, thus it may or

may not be an influential point.

The upshot : Sometimes there are good reasons for excluding subsets (there

were errors in the data entry; there were errors in the experiment).

Sometimes - the outlier belongs in the data. Outliers should always be examined.

Checking for Outliers

Look at the standardized residuals :

Compare the standardized residuals to the - 2 to +2 band (or - 1 to + 1).

  • Standardized residuals bigger than 1 are large.
  • Standardized residuals bigger than 2 extremely large.

Most statistics packages will calculate these automatically.

Effect of Outliers: Examples

y = 2.6 + 0.07 x

y = 1.34 + 0.13 x

y = 1.29 + 0.17 x

y = 4.9 – 0.07 x

Coefficient of Determination

A statistic that efficiently summarizes how well the X’s can be used

to predict Y is the R-square:

R

2 = 1 – SSE / SST

R

2 = Proportion of total

variability in Y that can be

explained by the regression

(that uses X)

which is interpreted as:

SSE = Q

$,&

$

SST = Q

$,&

$