Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Validating Linear Model: Parameters, Hypotheses, and Residual Diagnosis - Prof. Emiliano V, Study notes of Mathematics

University of Connecticut (UConn) - Avery Point Mathematics

Prof. Emiliano Valdez

This document, from the university of connecticut - storrs, fall 2009 semester, covers the validation of a linear model in the context of applied actuarial statistics. Topics include interpreting parameter estimates, hypothesis testing for slope, testing variable significance, interval estimates, prediction and prediction intervals, analyzing residuals, checking normality, detecting constant variance, and dealing with unusual observations. The document also includes r source codes for fitting the model and analyzing the residuals.

Typology: Study notes

Pre 2010

Uploaded on 02/24/2010

koofers-user-i4o 🇺🇸

10 documents

1 / 18

This page cannot be seen from the preview

Don't miss anything!

Validating the Linear

Model

EA Valdez

Model interpretations

Inferenceon the slope

estimates

Testingvariable significance

Interval estimates

Prediction and prediction

interval

Fitting regression line

with R

R source codes forfitting

model

Prediction intervals

Analyzing the residuals

Checking Normality

Detecting constant variance

Unusual observations

Some advice on unusual

observations

page 1

Validating the Linear Model

Math 3621 Applied Actuarial Statistics

Fall 2009 semester

EA Valdez

University of Connecticut - Storrs

Lecture Week 5

Discover Study notes of Mathematics University of Connecticut (UConn) - Avery Point

Partial preview of the text

Download Validating Linear Model: Parameters, Hypotheses, and Residual Diagnosis - Prof. Emiliano V and more Study notes Mathematics in PDF only on Docsity!

Model

EA Valdez

Model interpretations

Inference on the slope estimates Testing variable significance Interval estimates Prediction and prediction interval

Fitting regression line

with R

R source codes for fitting model Prediction intervals

Analyzing the residuals

Checking Normality Detecting constant variance Unusual observations Some advice on unusual observations

Validating the Linear Model

Math 3621 Applied Actuarial Statistics

Fall 2009 semester

EA Valdez

University of Connecticut - Storrs

Lecture Week 5

Model

EA Valdez

Model interpretations

Inference on the slope estimates Testing variable significance Interval estimates Prediction and prediction interval

Fitting regression line

with R

R source codes for fitting model Prediction intervals

Analyzing the residuals

Checking Normality Detecting constant variance Unusual observations Some advice on unusual observations

Interpreting the parameter estimates

Recall the estimated regression equation:

Ŷ = b 0 +^ b 1 X^ ,

where b 0 is the intercept and b 1 is the slope coefficient.

Interpreting these parameter estimates:

We expect Y = b 0 when X = 0, but only if this makes

sense.

We expect Y to change by an amount of b 1 whenever X

increases by one unit.

Model

EA Valdez

Model interpretations

Inference on the slope estimates Testing variable significance Interval estimates Prediction and prediction interval

Fitting regression line

with R

R source codes for fitting model Prediction intervals

Analyzing the residuals

Checking Normality Detecting constant variance Unusual observations Some advice on unusual observations

Is the independent variable important? The t-test

In assessing whether X is an important (or significant)

predictor variable, we conduct the t-test for

H 0 : β 1 = 0 , vs Ha : β 1 6 = 0.

The test statistic simplifies to

t-ratio = t(b 1 ) =

b 1

se(b 1 )

Reject the H 0 if |t(b 1 )| > tα/ 2 ,n− 2 and say that there is

reason to believe the independent variable X is an

important predictor. Otherwise, if we accept H 0 , then there

is reason to believe that it is not.

Model

EA Valdez

Model interpretations

Inference on the slope estimates Testing variable significance Interval estimates Prediction and prediction interval

Fitting regression line

with R

R source codes for fitting model Prediction intervals

Analyzing the residuals

Checking Normality Detecting constant variance Unusual observations Some advice on unusual observations

Constructing a confidence interval for β 1

A 100(1-α)% confidence interval for the slope β 1 is given by

b 1 ± tα/ 2 ,n− 2 se(b 1 ) = b 1 ± tα/ 2 ,n− 2

√ n − 1

where tα/ 2 ,n− 2 refers to the 100( 1 − α/ 2 )-th (upper) percentile

of a t-distribution with n − 2 degrees of freedom.

Model

EA Valdez

Model interpretations

Inference on the slope estimates Testing variable significance Interval estimates Prediction and prediction interval

Fitting regression line

with R

R source codes for fitting model Prediction intervals

Analyzing the residuals

Checking Normality Detecting constant variance Unusual observations Some advice on unusual observations

R source codes for fitting model

# fitting the linear model with income as predictor to purchase price

> lm1 <- lm(price~income)

> summary(lm1)

Call:

lm(formula = price ~ income)

Residuals:

Min 1Q Median 3Q Max

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 5.866e+03 7.498e+02 7.824 9.8e-11 ***

income 2.113e-01 1.508e-02 14.009 < 2e-16 ***

Signif. codes: 0 ’’ 0.001 ’’ 0.01 ’’ 0.05 ’.’ 0.1 ’ ’ 1

Residual standard error: 2459 on 60 degrees of freedom

Multiple R-Squared: 0.7659, Adjusted R-squared: 0.

F-statistic: 196.3 on 1 and 60 DF, p-value: < 2.2e-

# ANOVA table

> anova(lm1)

Analysis of Variance Table

Response: price

Df Sum Sq Mean Sq F value Pr(>F)

income 1 1186892153 1186892153 196.26 < 2.2e-16 ***

Residuals 60 362851718 6047529

Signif. codes: 0 ’’ 0.001 ’’ 0.01 ’’ 0.05 ’.’ 0.1 ’ ’ 1

> new.income <-data.frame(income=c(75000))

> predict(lm1,new.income,interval="prediction")

fit lwr upr

[1,] 21715.63 16676.12 26755.

Model

EA Valdez

Model interpretations

Inference on the slope estimates Testing variable significance Interval estimates Prediction and prediction interval

Fitting regression line

with R

R source codes for fitting model Prediction intervals

Analyzing the residuals

Checking Normality Detecting constant variance Unusual observations Some advice on unusual observations

Prediction intervals

# the prediction interval

> new <- data.frame(income=seq(min(income),max(income),5))

> predinc.plim <- predict(lm1,new,interval="prediction")

> matplot(new$income,predinc.plim,col=c("black","red","red"),lty=c(1,2,3),type="l",xlab="income",

ylab="purchase price",main="The estimated regression line with prediction intervals")

> points(income,price,col="blue",cex=1.5)

2 e+04 4 e+04 6 e+04 8 e+04 1 e+

5000

10000

15000

20000

25000

30000

The estimated regression line with prediction intervals

income

purchase price l

l l

l (^) l

l l

Model

EA Valdez

Model interpretations

Inference on the slope estimates Testing variable significance Interval estimates Prediction and prediction interval

Fitting regression line

with R

R source codes for fitting model Prediction intervals

Analyzing the residuals

Checking Normality Detecting constant variance Unusual observations Some advice on unusual observations

Detecting constant variance

In detecting homoscedasticity, plot the fitted values against the residuals.

# detect homoscedasticity

> plot(fitted(lm1),residuals(lm1),xlab="Fitted",ylab="Residuals",cex=1.2,

main="Fitted values vs residuals for the Car Price data")

> abline(h=0,col="blue")

l l

l l l

l l l l

l l

l l l

l l

−

2000

4000

6000

Fitted values vs residuals for the Car Price data

Fitted

Residuals

Model

EA Valdez

Model interpretations

Inference on the slope estimates Testing variable significance Interval estimates Prediction and prediction interval

Fitting regression line

with R

R source codes for fitting model Prediction intervals

Analyzing the residuals

Checking Normality Detecting constant variance Unusual observations Some advice on unusual observations

Checking heteroscedasticity - what to look for?

l l lll

l l

lll

l l

l ll l

no problem

lllll

l l

l l l ll

l l

l ll

ll l

mild heteroscedasticity

llllll lll

ll l l

lll

l ll

ll lll l

l l

l l l

strong heteroscedasticity

llll

l l

non−linear

Model

EA Valdez

Model interpretations

Inference on the slope estimates Testing variable significance Interval estimates Prediction and prediction interval

Fitting regression line

with R

R source codes for fitting model Prediction intervals

Analyzing the residuals

Checking Normality Detecting constant variance Unusual observations Some advice on unusual observations

Outliers and high leverage points

An observation is considered “unusual” if far from the

majority of the data set.

It could be “unusual”:

in the vertical direction, in which case we call it an outlier; or

in the horizontal direction, in which case we call it a high

leverage point.

It is possible that an observation is unusual in both

directions - hence both an outlier and a high leverage

point.

In the next few slides, we illustrate the effect of unusual

observations by considering a “fictitious” data set as in

Frees book, Example on page 42.

Model

EA Valdez

Model interpretations

Inference on the slope estimates Testing variable significance Interval estimates Prediction and prediction interval

Fitting regression line

with R

R source codes for fitting model Prediction intervals

Analyzing the residuals

Checking Normality Detecting constant variance Unusual observations Some advice on unusual observations

An example data set with unusual observations

The following plot shows three “unusual” observations -

denoted by points A, B, and C.

l ll

l l

ll^ l

l l

lll l

l l

0 2 4 6 8 10

A B

the 19 base points

Model

EA Valdez

Model interpretations

Inference on the slope estimates Testing variable significance Interval estimates Prediction and prediction interval

Fitting regression line

with R

R source codes for fitting model Prediction intervals

Analyzing the residuals

Checking Normality Detecting constant variance Unusual observations Some advice on unusual observations

Effects on the regression

The following table summarizes the results of running the

various regression models to assess the impact of the unusual

observations.

Data b 0 b 1 s R^2 (%) t(b 1 )

Base 1.869 0.611 0.288 89.0 11.

Base + A 1.750 0.693 0.846 53.7 4. Base + B 1.775 0.640 0.285 94.7 18. Base + C 3.356 0.155 0.865 10.3 1.

Model

EA Valdez

Model interpretations

Inference on the slope estimates Testing variable significance Interval estimates Prediction and prediction interval

Fitting regression line

with R

R source codes for fitting model Prediction intervals

Analyzing the residuals

Checking Normality Detecting constant variance Unusual observations Some advice on unusual observations

Some advice on unusual observations

What can be done about unusual observations?

Check for possible error in data entry.

Investigate the reasons for why it happened?

Possible to exclude first, then try including again to evaluate

its impact.

Maybe they are not mistakes or aberrations, but may be

naturally occurring.

It could be dangerous to immediately exclude them

altogether.

Check out p. 68 of Faraway - we’ll deal with this issue

again later in multiple regression.

Validating Linear Model: Parameters, Hypotheses, and Residual Diagnosis - Prof. Emiliano V, Study notes of Mathematics

Related documents

Partial preview of the text

Download Validating Linear Model: Parameters, Hypotheses, and Residual Diagnosis - Prof. Emiliano V and more Study notes Mathematics in PDF only on Docsity!

Model

EA Valdez

Model interpretations

Fitting regression line

with R

Analyzing the residuals

Validating the Linear Model

Math 3621 Applied Actuarial Statistics

Fall 2009 semester

EA Valdez

University of Connecticut - Storrs

Lecture Week 5

Model

EA Valdez

Model interpretations

Fitting regression line

with R

Analyzing the residuals

Model

EA Valdez

Model interpretations

Fitting regression line

with R

Analyzing the residuals

Model

EA Valdez

Model interpretations

Fitting regression line

with R

Analyzing the residuals

Model

EA Valdez

Model interpretations

Fitting regression line

with R

Analyzing the residuals

# fitting the linear model with income as predictor to purchase price

> lm1 <- lm(price~income)

> summary(lm1)

Call:

lm(formula = price ~ income)

Residuals:

Min 1Q Median 3Q Max

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 5.866e+03 7.498e+02 7.824 9.8e-11 ***

income 2.113e-01 1.508e-02 14.009 < 2e-16 ***

Signif. codes: 0 ’’ 0.001 ’’ 0.01 ’’ 0.05 ’.’ 0.1 ’ ’ 1

Residual standard error: 2459 on 60 degrees of freedom

Multiple R-Squared: 0.7659, Adjusted R-squared: 0.

F-statistic: 196.3 on 1 and 60 DF, p-value: < 2.2e-

# ANOVA table

> anova(lm1)

Analysis of Variance Table

Response: price

Df Sum Sq Mean Sq F value Pr(>F)

income 1 1186892153 1186892153 196.26 < 2.2e-16 ***

Residuals 60 362851718 6047529

Signif. codes: 0 ’’ 0.001 ’’ 0.01 ’’ 0.05 ’.’ 0.1 ’ ’ 1

> new.income <-data.frame(income=c(75000))

> predict(lm1,new.income,interval="prediction")

fit lwr upr

[1,] 21715.63 16676.12 26755.

Model

EA Valdez

Model interpretations

Fitting regression line

with R

Analyzing the residuals

# the prediction interval

> new <- data.frame(income=seq(min(income),max(income),5))

> predinc.plim <- predict(lm1,new,interval="prediction")

> matplot(new$income,predinc.plim,col=c("black","red","red"),lty=c(1,2,3),type="l",xlab="income",

ylab="purchase price",main="The estimated regression line with prediction intervals")

> points(income,price,col="blue",cex=1.5)

The estimated regression line with prediction intervals