










Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
This document, from the university of connecticut - storrs, fall 2009 semester, covers the validation of a linear model in the context of applied actuarial statistics. Topics include interpreting parameter estimates, hypothesis testing for slope, testing variable significance, interval estimates, prediction and prediction intervals, analyzing residuals, checking normality, detecting constant variance, and dealing with unusual observations. The document also includes r source codes for fitting the model and analyzing the residuals.
Typology: Study notes
1 / 18
This page cannot be seen from the preview
Don't miss anything!











Inference on the slope estimates Testing variable significance Interval estimates Prediction and prediction interval
R source codes for fitting model Prediction intervals
Checking Normality Detecting constant variance Unusual observations Some advice on unusual observations
Inference on the slope estimates Testing variable significance Interval estimates Prediction and prediction interval
R source codes for fitting model Prediction intervals
Checking Normality Detecting constant variance Unusual observations Some advice on unusual observations
Interpreting the parameter estimates
Recall the estimated regression equation:
Ŷ = b 0 +^ b 1 X^ ,
where b 0 is the intercept and b 1 is the slope coefficient.
Interpreting these parameter estimates:
We expect Y = b 0 when X = 0, but only if this makes
sense.
We expect Y to change by an amount of b 1 whenever X
increases by one unit.
Inference on the slope estimates Testing variable significance Interval estimates Prediction and prediction interval
R source codes for fitting model Prediction intervals
Checking Normality Detecting constant variance Unusual observations Some advice on unusual observations
Is the independent variable important? The t-test
In assessing whether X is an important (or significant)
predictor variable, we conduct the t-test for
H 0 : β 1 = 0 , vs Ha : β 1 6 = 0.
The test statistic simplifies to
t-ratio = t(b 1 ) =
b 1
se(b 1 )
.
Reject the H 0 if |t(b 1 )| > tα/ 2 ,n− 2 and say that there is
reason to believe the independent variable X is an
important predictor. Otherwise, if we accept H 0 , then there
is reason to believe that it is not.
Inference on the slope estimates Testing variable significance Interval estimates Prediction and prediction interval
R source codes for fitting model Prediction intervals
Checking Normality Detecting constant variance Unusual observations Some advice on unusual observations
Constructing a confidence interval for β 1
A 100(1-α)% confidence interval for the slope β 1 is given by
b 1 ± tα/ 2 ,n− 2 se(b 1 ) = b 1 ± tα/ 2 ,n− 2
s
sX
√ n − 1
,
where tα/ 2 ,n− 2 refers to the 100( 1 − α/ 2 )-th (upper) percentile
of a t-distribution with n − 2 degrees of freedom.
Inference on the slope estimates Testing variable significance Interval estimates Prediction and prediction interval
R source codes for fitting model Prediction intervals
Checking Normality Detecting constant variance Unusual observations Some advice on unusual observations
R source codes for fitting model
Inference on the slope estimates Testing variable significance Interval estimates Prediction and prediction interval
R source codes for fitting model Prediction intervals
Checking Normality Detecting constant variance Unusual observations Some advice on unusual observations
Prediction intervals
2 e+04 4 e+04 6 e+04 8 e+04 1 e+
5000
10000
15000
20000
25000
30000
income
purchase price l
l
l
l
l
l l
l
ll
l
l
l
l
l
l
l
l l
l
l
l
l l
l l
l
l l
l l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l (^) l
l
l
l l
l
l
l
l
Inference on the slope estimates Testing variable significance Interval estimates Prediction and prediction interval
R source codes for fitting model Prediction intervals
Checking Normality Detecting constant variance Unusual observations Some advice on unusual observations
Detecting constant variance
In detecting homoscedasticity, plot the fitted values against the residuals.
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l l
l
l
l l l l
l
l l
l
l
l
l
l
l
l l
l
l
l
l
l l
l
l
l l l
l
l
l
l l
l
l
l
l
l l
l l
l
−
−
0
2000
4000
6000
Fitted values vs residuals for the Car Price data
Residuals
Inference on the slope estimates Testing variable significance Interval estimates Prediction and prediction interval
R source codes for fitting model Prediction intervals
Checking Normality Detecting constant variance Unusual observations Some advice on unusual observations
Checking heteroscedasticity - what to look for?
l
ll
l
l
l
l
l l lll
l
l
l l
l
l
lll
l
l
l
l
l
l
l
l l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l ll l
l
l
no problem
lllll
l
l
l l
l
ll
l l l ll
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l ll
l
l
ll
ll
l
l
l
l
ll l
l
l
mild heteroscedasticity
llllll lll
l
l
l
ll l l
lll
l
l
l ll
ll lll l
l
l
l
l
l
l l
l l l
ll
l
l
l
l
l l l
l
strong heteroscedasticity
l
llll
l
l
l
ll
l
ll
l
l
l
l
l
l
l
l
ll
l
l l
l
l
l
l l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
non−linear
Inference on the slope estimates Testing variable significance Interval estimates Prediction and prediction interval
R source codes for fitting model Prediction intervals
Checking Normality Detecting constant variance Unusual observations Some advice on unusual observations
Outliers and high leverage points
An observation is considered “unusual” if far from the
majority of the data set.
It could be “unusual”:
in the vertical direction, in which case we call it an outlier; or
in the horizontal direction, in which case we call it a high
leverage point.
It is possible that an observation is unusual in both
directions - hence both an outlier and a high leverage
point.
In the next few slides, we illustrate the effect of unusual
observations by considering a “fictitious” data set as in
Frees book, Example on page 42.
Inference on the slope estimates Testing variable significance Interval estimates Prediction and prediction interval
R source codes for fitting model Prediction intervals
Checking Normality Detecting constant variance Unusual observations Some advice on unusual observations
An example data set with unusual observations
The following plot shows three “unusual” observations -
denoted by points A, B, and C.
l
l
l ll
l l
ll^ l
l l
l
l
lll l
l
l l
l
0 2 4 6 8 10
2
3
4
5
6
7
8
9
x
y
A B
C
the 19 base points
Inference on the slope estimates Testing variable significance Interval estimates Prediction and prediction interval
R source codes for fitting model Prediction intervals
Checking Normality Detecting constant variance Unusual observations Some advice on unusual observations
Effects on the regression
The following table summarizes the results of running the
various regression models to assess the impact of the unusual
observations.
Data b 0 b 1 s R^2 (%) t(b 1 )
Base 1.869 0.611 0.288 89.0 11.
Base + A 1.750 0.693 0.846 53.7 4. Base + B 1.775 0.640 0.285 94.7 18. Base + C 3.356 0.155 0.865 10.3 1.
Inference on the slope estimates Testing variable significance Interval estimates Prediction and prediction interval
R source codes for fitting model Prediction intervals
Checking Normality Detecting constant variance Unusual observations Some advice on unusual observations
Some advice on unusual observations
What can be done about unusual observations?
Check for possible error in data entry.
Investigate the reasons for why it happened?
Possible to exclude first, then try including again to evaluate
its impact.
Maybe they are not mistakes or aberrations, but may be
naturally occurring.
It could be dangerous to immediately exclude them
altogether.
Check out p. 68 of Faraway - we’ll deal with this issue
again later in multiple regression.