Multiple Regression: Model and Estimation - Prof. Emiliano Valdez, Study notes of Mathematics

This document, used in a university of connecticut - storrs math 3621 applied actuarial statistics course in the fall 2009 semester, provides an in-depth analysis of multiple regression, including the model, estimation, least squares, hat matrix, gauss-markov theorem, and goodness of fit measures. It also includes an example of catastrophic bonds and an additional case study on demand for term life insurance.

Typology: Study notes

Pre 2010

Uploaded on 02/25/2010

koofers-user-ny6
koofers-user-ny6 🇺🇸

10 documents

1 / 28

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Multiple Regression:
Model and
Estimation
EA Valdez
Introduction
The regression model
Least squares estimates
The hat (or projection)
matrix
Properties
Gauss-Markov Theorem
Some goodness of fit
measures
An example -
catastrophic bonds
Morton Lane’s study
Initial data analysis
Preliminary visual analysis
R source codes forfitting
the linear models
R source codes forfitting
the linear models
Interpreting the regression
coefficients
Added variable plots
What are they?
How to do added variable
plots?
Additional case study
Demand forter m life
insurance
page 1
Multiple Regression: Model and
Estimation
Math 3621 Applied Actuarial Statistics
Fall 2009 semester
EA Valdez
University of Connecticut - Storrs
Lecture Weeks 6-7
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c

Partial preview of the text

Download Multiple Regression: Model and Estimation - Prof. Emiliano Valdez and more Study notes Mathematics in PDF only on Docsity!

Model and Estimation

EA Valdez

Introduction The regression model Least squares estimates The hat (or projection) matrix Properties Gauss-Markov Theorem Some goodness of fit measures

An example - catastrophic bonds Morton Lane’s study Initial data analysis Preliminary visual analysis R source codes for fitting the linear models R source codes for fitting the linear models Interpreting the regression coefficients

Added variable plots What are they? How to do added variable plots?

Additional case study Demand for term life insurance

Multiple Regression: Model and

Estimation

Math 3621 Applied Actuarial Statistics

Fall 2009 semester

EA Valdez

University of Connecticut - Storrs

Lecture Weeks 6-

Model and Estimation

EA Valdez

Introduction The regression model Least squares estimates The hat (or projection) matrix Properties Gauss-Markov Theorem Some goodness of fit measures

An example - catastrophic bonds Morton Lane’s study Initial data analysis Preliminary visual analysis R source codes for fitting the linear models R source codes for fitting the linear models Interpreting the regression coefficients

Added variable plots What are they? How to do added variable plots?

Additional case study Demand for term life insurance

The observable data

Assume our observed data set consists of

(Xi 0 , Xi 1 ,... , Xik , Yi ) for i = 1 , 2 ,... , n.

n is total number of observations;

Xi 0 is associated with the “intercept” term and is usually 1;

and

k is the number of explanatory variables.

Define the vector of responses, Y , and matrix of

explanatory variables, X , as

Y =

Y 1

Y 2

Yn

and X =

X 10 X 11 X 12 · · · X 1 k

X 20 X 21 X 22 · · · X 2 k

Xn 0 Xn 1 Xn 2 · · · Xnk

Model and Estimation

EA Valdez

Introduction The regression model Least squares estimates The hat (or projection) matrix Properties Gauss-Markov Theorem Some goodness of fit measures

An example - catastrophic bonds Morton Lane’s study Initial data analysis Preliminary visual analysis R source codes for fitting the linear models R source codes for fitting the linear models Interpreting the regression coefficients

Added variable plots What are they? How to do added variable plots?

Additional case study Demand for term life insurance

Specific individual observation

For a specific observation i, define the row vector of

observed explanatory variables by

X

i

= [Xi 0 , Xi 1 ,... , Xik ]

Thus, we see that the regression model for this specific

observation can be written as

Yi = X

i

β + εi.

Model and Estimation

EA Valdez

Introduction The regression model Least squares estimates The hat (or projection) matrix Properties Gauss-Markov Theorem Some goodness of fit measures

An example - catastrophic bonds Morton Lane’s study Initial data analysis Preliminary visual analysis R source codes for fitting the linear models R source codes for fitting the linear models Interpreting the regression coefficients

Added variable plots What are they? How to do added variable plots?

Additional case study Demand for term life insurance

Least squares estimates

The least squares estimates of β, denoted b , minimizes

the sum of squares

SS(β) = ε

ε = ( Y − X β)

( Y − X β).

Note that there are (k + 1 ) parameters to estimate,

including the intercept.

Differentiating and then setting to zero, we have the

normal equations:

X

Xb = X

Y ,

where b is the least squares vector.

Provided X

X is invertible, we have

b = ( X

X )

X

Y.

Model and Estimation

EA Valdez

Introduction The regression model Least squares estimates The hat (or projection) matrix Properties Gauss-Markov Theorem Some goodness of fit measures

An example - catastrophic bonds Morton Lane’s study Initial data analysis Preliminary visual analysis R source codes for fitting the linear models R source codes for fitting the linear models Interpreting the regression coefficients

Added variable plots What are they? How to do added variable plots?

Additional case study Demand for term life insurance

Properties of the parameter estimates

Unbiased estimates:

E( b ) = β.

Variance-covariance matrix:

Var( b ) = σ

( X

X )

Estimate for σ

s

= Error MS =

Error SS

n − (k + 1 )

Standard error for a particular component of b :

se(bi− 1 ) = s

( X

X )

ii

, for i = 1 , 2 ,... , k + 1.

Model and Estimation

EA Valdez

Introduction The regression model Least squares estimates The hat (or projection) matrix Properties Gauss-Markov Theorem Some goodness of fit measures

An example - catastrophic bonds Morton Lane’s study Initial data analysis Preliminary visual analysis R source codes for fitting the linear models R source codes for fitting the linear models Interpreting the regression coefficients

Added variable plots What are they? How to do added variable plots?

Additional case study Demand for term life insurance

Gauss-Markov Theorem

There are some reasons why the least squares estimates

b are good estimates for β:

Geometrically, it does makes sense because it results from

an orthogonal projection onto the linear space.

These least squares estimates are equivalent to maximum

likelihood estimates in the case where the errors are i.i.d.

normally distributed.

According to the Gauss-Markov theorem, the least squares

estimates are Best Linear Unbiased Estimates (BLUE).

Details of proof of the Gauss-Markov Theorem will be

provided in lectures.

Model and Estimation

EA Valdez

Introduction The regression model Least squares estimates The hat (or projection) matrix Properties Gauss-Markov Theorem Some goodness of fit measures

An example - catastrophic bonds Morton Lane’s study Initial data analysis Preliminary visual analysis R source codes for fitting the linear models R source codes for fitting the linear models Interpreting the regression coefficients

Added variable plots What are they? How to do added variable plots?

Additional case study Demand for term life insurance

Some goodness of fit measures

The proportion of variability (still, just like the simple linear

regression model) explained by the regression model is

R

Regression SS

Total SS

n

i= 1

Yi − Y )

n

i= 1

(Yi − Y )

This is also called the coefficient of determination.

When an explanatory variable is added to the regression

model, unfortunately, this R

never decreases.

The adjusted R

defined by

R

a =^1 −^

Error SS/(n − (k + 1 ))

Total SS/(n − 1 )

s

s

Y

provides for the proportion of the variation explained by

the regression, but adjusted for the number of predictor

variables (or degrees of freedom).

Model and Estimation

EA Valdez

Introduction The regression model Least squares estimates The hat (or projection) matrix Properties Gauss-Markov Theorem Some goodness of fit measures

An example - catastrophic bonds Morton Lane’s study Initial data analysis Preliminary visual analysis R source codes for fitting the linear models R source codes for fitting the linear models Interpreting the regression coefficients

Added variable plots What are they? How to do added variable plots?

Additional case study Demand for term life insurance

Morton Lane’s study of catastrophic bonds

Published in ASTIN BUlletin (Vol. 30, Year 2000, pp

Lane fitted regression models to help explain the pricing of

risk transfer in the catastrophic bond market.

CAT bonds refer to securities that provide for coupon

payments and principal based on the aggregate losses of

a portfolio of insurance contracts.

CAT bonds are meant to provide insurance companies a

way to manage catastrophic insurance risks, and at the

same time, investors who wish to have the opportunity to

profit from the transfer of insurance risks.

Lane consider 16 catastrophic bond issues made in 1999.

Model and Estimation

EA Valdez

Introduction The regression model Least squares estimates The hat (or projection) matrix Properties Gauss-Markov Theorem Some goodness of fit measures

An example - catastrophic bonds Morton Lane’s study Initial data analysis Preliminary visual analysis R source codes for fitting the linear models R source codes for fitting the linear models Interpreting the regression coefficients

Added variable plots What are they? How to do added variable plots?

Additional case study Demand for term life insurance

Preliminary investigation of the data

read the data file

> cat.bond <- read.csv("C:/Documents and Settings/.../Math238-Fall2007/Data/CATBond-data.csv") attach(cat.bond)

> cat.bond Transaction EER PFL CEL 1 Mosaic 2A 0.0364 0.0115 0. 2 Mosaic 2B 0.0552 0.0525 0. 3 Halyard Re 0.0393 0.0084 0. 4 Domestic Re 0.0324 0.0058 0. 5 Concentric Re 0.0272 0.0064 0. 6 Juno Re 0.0381 0.0060 0. 7 Residential Re 0.0327 0.0076 0. 8 Kelvin 1st Event 0.0652 0.1210 0. 9 Kelvin 2nd Event 0.0452 0.0156 0. 10 Gold Eagle A 0.0282 0.0017 1. 11 Gold Eagle B 0.0485 0.0078 0. 12 Namazu Re 0.0381 0.0100 0. 13 Atlas Re A 0.0263 0.0019 0. 14 Atlas Re B 0.0352 0.0029 0. 15 Atlas Re C 0.1095 0.0547 0. 16 Seismic Ltd 0.0383 0.0113 0.

> summary(cat.bond) Transaction EER PFL CEL Atlas Re A : 1 Min. :0.02630 Min. :0.00170 Min. :0. Atlas Re B : 1 1st Qu.:0.03263 1st Qu.:0.00595 1st Qu.:0. Atlas Re C : 1 Median :0.03810 Median :0.00810 Median :0. Concentric Re: 1 Mean :0.04349 Mean :0.02032 Mean :0. Domestic Re : 1 3rd Qu.:0.04603 3rd Qu.:0.01253 3rd Qu.:0. Gold Eagle A : 1 Max. :0.10950 Max. :0.12100 Max. :1. (Other) :

you can also do mean, sd, quantiles, etc.; not done here.

Model and Estimation

EA Valdez

Introduction The regression model Least squares estimates The hat (or projection) matrix Properties Gauss-Markov Theorem Some goodness of fit measures

An example - catastrophic bonds Morton Lane’s study Initial data analysis Preliminary visual analysis R source codes for fitting the linear models R source codes for fitting the linear models Interpreting the regression coefficients

Added variable plots What are they? How to do added variable plots?

Additional case study Demand for term life insurance

Histogram of variables

histograms and sorted plots

> par(mfrow=c(3,2)) > hist(EER,br=10) > plot(sort(EER),pch=3) > hist(PFL,br=10) > plot(sort(PFL),pch=3) > hist(CEL,br=10) > plot(sort(CEL),pch=3)

Histogram of EER

EER

Frequency

0.02 0.04 0.06 0.08 0.

0

2

4

6

8

5 10 15

Index

sort(EER)

Histogram of PFL

PFL

Frequency

0.00 0.02 0.04 0.06 0.08 0.10 0.

0

2

4

6

8

10

5 10 15

Index

sort(PFL)

Histogram of CEL

CEL

Frequency

0.2 0.4 0.6 0.8 1.

0

1

2

3

4

5 10 15

Index

sort(CEL)

Model and Estimation

EA Valdez

Introduction The regression model Least squares estimates The hat (or projection) matrix Properties Gauss-Markov Theorem Some goodness of fit measures

An example - catastrophic bonds Morton Lane’s study Initial data analysis Preliminary visual analysis R source codes for fitting the linear models R source codes for fitting the linear models Interpreting the regression coefficients

Added variable plots What are they? How to do added variable plots?

Additional case study Demand for term life insurance

Scatter plot matrix

scatter plot matrix

> pairs(data.frame(EER,PFL,CEL),cex=1.5,pch=19)

EER

0.00 0.04 0.08 0.

l

l

l l l

l l

l

l

l

l l l

l

l

l 0.

l

l

l l l

l l

l

l

l

l l l

l

l

l

l

l

l llll

l

l l

ll l l

l

l

PFL

l

l

ll ll (^) l

l

l l

ll l l

l

l

0.04 0.06 0.08 0.

l

l

l

l

l

l

l

l

l

l

l l

l

l

l

l

l

l

l

l

l

l

l

l

l

l

l l

l

l

l

l

0.2 0.4 0.6 0.8 1.

CEL

Model and Estimation

EA Valdez

Introduction The regression model Least squares estimates The hat (or projection) matrix Properties Gauss-Markov Theorem Some goodness of fit measures

An example - catastrophic bonds Morton Lane’s study Initial data analysis Preliminary visual analysis R source codes for fitting the linear models R source codes for fitting the linear models Interpreting the regression coefficients

Added variable plots What are they? How to do added variable plots?

Additional case study Demand for term life insurance

Scatter plot matrix

scatter plot matrix

> pairs(data.frame(log(EER),log(PFL),log(CEL)),cex=1.5,pch=19, labels=c("log(EER)","log(PFL)","log(CEL)"))

log(EER)

−6 −5 −4 −3 −

l

l

l l l

l l

l

l

l

l

l

l

l

l

l

−3.

−3.

−2.

−2.

l

l

l l l

l l

l

l

l

l

l

l

l

l

l

l

l

l l (^) ll l

l

l

l

l

l

l

l

l

l

log(PFL)

l

l

l l (^) ll l

l

l

l

l

l

l

l

l

l

−3.6 −3.2 −2.8 −2.

l

l

l

l

l

l l

l

l

l

l l l

l

l l

l

l

l

l

l

l l

l

l

l l l

l

l

l l

−1.5 −1.0 −0.5 0.

−1.

−1.

−0.

log(CEL)

Model and Estimation

EA Valdez

Introduction The regression model Least squares estimates The hat (or projection) matrix Properties Gauss-Markov Theorem Some goodness of fit measures

An example - catastrophic bonds Morton Lane’s study Initial data analysis Preliminary visual analysis R source codes for fitting the linear models R source codes for fitting the linear models Interpreting the regression coefficients

Added variable plots What are they? How to do added variable plots?

Additional case study Demand for term life insurance

R source codes for fitting the linear models

fitting the linear models with EER as response and PFL and CEL as predictors

> lm1 <- lm(EER~ PFL + CEL) > summary(lm1)

Call: lm(formula = EER ~ PFL + CEL)

Residuals: Min 1Q Median 3Q Max -0.0217089 -0.0061226 -0.0016851 0.0005938 0.

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.032502 0.017062 1.905 0.. PFL 0.439915 0.153191 2.872 0.0131 * CEL 0.003201 0.023279 0.138 0.

Signif. codes: 0 ’’ 0.001 ’’ 0.01 ’’ 0.05 ’.’ 0.1 ’ ’ 1

Residual standard error: 0.01646 on 13 degrees of freedom Multiple R-Squared: 0.4361, Adjusted R-squared: 0. F-statistic: 5.027 on 2 and 13 DF, p-value: 0.

ANOVA table

> anova(lm1) Analysis of Variance Table

Response: EER Df Sum Sq Mean Sq F value Pr(>F) PFL 1 0.0027204 0.0027204 10.0357 0.007412 (^) ** CEL 1 0.0000051 0.0000051 0.0189 0. Residuals 13 0.0035239 0.

Signif. codes: 0 ’’ 0.001 ’’ 0.01 ’’ 0.05 ’.’ 0.1 ’ ’ 1

Model and Estimation

EA Valdez

Introduction The regression model Least squares estimates The hat (or projection) matrix Properties Gauss-Markov Theorem Some goodness of fit measures

An example - catastrophic bonds Morton Lane’s study Initial data analysis Preliminary visual analysis R source codes for fitting the linear models R source codes for fitting the linear models Interpreting the regression coefficients

Added variable plots What are they? How to do added variable plots?

Additional case study Demand for term life insurance

R source codes for fitting the linear models

fitting the linear models with log(EER) as response and log(PFL) and log(CEL) as predictors

> summary(lm2)

Call: lm(formula = log(EER) ~ log(PFL) + log(CEL))

Residuals: Min 1Q Median 3Q Max -0.28900 -0.12959 -0.04742 0.08484 0.

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -1.80250 0.29559 -6.098 3.79e-05 *** log(PFL) 0.28668 0.05283 5.427 0.000116 *** log(CEL) 0.15409 0.15057 1.023 0.

Signif. codes: 0 ’’ 0.001 ’’ 0.01 ’’ 0.05 ’.’ 0.1 ’ ’ 1

Residual standard error: 0.206 on 13 degrees of freedom Multiple R-Squared: 0.72, Adjusted R-squared: 0. F-statistic: 16.71 on 2 and 13 DF, p-value: 0.

> anova(lm2) Analysis of Variance Table

Response: log(EER) Df Sum Sq Mean Sq F value Pr(>F) log(PFL) 1 1.37427 1.37427 32.3816 7.41e-05 *** log(CEL) 1 0.04445 0.04445 1.0474 0. Residuals 13 0.55172 0.

Signif. codes: 0 ’’ 0.001 ’’ 0.01 ’’ 0.05 ’.’ 0.1 ’ ’ 1