Linear Regression: Measuring Model Fit and Correlation - Prof. Brian C. Dennis, Exams of Statistics

The concept of the coefficient of determination (r-squared) as a measure of the quality of fit for a linear regression model. It also covers correlation as a measure of linear association between two random variables. Formulas, examples, and hypotheses tests for both concepts.

Typology: Exams

Pre 2010

Uploaded on 08/19/2009

koofers-user-9i2
koofers-user-9i2 🇺🇸

8 documents

1 / 6

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Linear regression, continued
The quality of fit of a linear regression model can be
measured by the . Recall thatcoefficient of determination
SS total SS regression SS residualab a b a bœ"
The coefficient of determination, universally called “ ”, is<#
œ %
#SS regression SS residual
SS total SS total
1
abab
ab ab
It is the proportion of total variability in the 's that isC3
described or accounted for by the regression model. The
value of is between 0 and 1; if 1, all the data<<œ
##
points lie on a line.
Interestingly, the likelihood ratio statistic for testing H :
!
""
""
œÁ0 vs H : 0 can be written in terms of the ratio
a
of variances, or the t statistic, or :<#
8%"
s
P
sœœ
s
sÎ8%#
!!
#
#
%8Î# %8Î#
aa
#$ ”•
aba b
abab
5
5
SS total
SS residual
œ" œ%<
>
8%"
”•
ab ˆ‰
11
#%8Î#
#8Î#
pf3
pf4
pf5

Partial preview of the text

Download Linear Regression: Measuring Model Fit and Correlation - Prof. Brian C. Dennis and more Exams Statistics in PDF only on Docsity!

Linear regression, continued

The quality of fit of a linear regression model can be measured by the coefficient of determination. Recall that

SS totala b œ SS regressiona b "SS residuala b

The coefficient of determination, universally called ì < #î, is

< #^ œ œ %

SS regression SS residual SS total SS total

a b a b a b a b

It is the proportion of total variability in the C 3 's that is described or accounted for by the regression model. The value of < #^ is between 0 and 1; if < # œ 1, all the data points lie on a line.

Interestingly, the likelihood ratio statistic for testing H :! " (^) " œ 0 vs H :a "" Á0 can be written in terms of the ratio of variances, or the t statistic, or < #:

Ps Î 8 % " Ps^

œ œ

s s Î 8 % #

! (^)!

%8Î# (^) %8Î#

a (^) a

# $ ”^ •

a b a b a b a b

SS total SS residual

œ " œ % <

” (^) a8 % " b •

1 ˆ^1 ‰

%8Î#

8Î#

Correlation

Correlation is a measure of linear association between two random variables. If \ and ] are random variables, then the correlation between them is a constant defined by

] 5

\ ] \ ]

œ

Ec a\ % ba] % bd

The value of the correlation is bounded between %1 and " 1. The expectation in the numerator is called the covariance of \ and ]; it is real-valued & unbounded (negative or positive).

A correlation (or covariance) of zero does not imply that the random variables are independent. Exception: if and ] have a bivariate normal distribution, then (^3) ] œ 0 implies independence.

Model: \ and ] have a bivariate normal distribution with means. (^) \ ,. (^) ] , variances (^5) # , (^5) ]#, and correlation (^3) ]. Pdf (joint) is a bell-shaped, elongated dome.

ex: ï height and weight ï mother's height and daughter's height ï SAT/ACT score and college GPA

Data: a B (^) " , C (^) " b a, B (^) # , C (^) # b, ..., a B 8 ,C 8 b

Multiple regression

Situation: more than one independent variable; want to predict ] from B (^) " , B (^) # , ..., B:.

ex: ï IRS predicts the amount of money to be recovered in an audit using (among other variables) amt. of deduction for charitable gifts, amt. of real estate losses, etc.

ï House appraiser predicts sale price of a house based on sq. ft., # bedrooms, ave. sale price in neighborhood, etc.

Idea: mean of ] is taken to be a linear function of the predictor variables:

Ea ] bœ " (^)! " " (^) " B (^) " " " (^) # B â "# ": B:

With just two predictor variables (not functionally dependent), this equation is a plane.

Model:

] μ normal ˆ^ " (^)! " " (^) " B (^) " " " (^) # B â "# " (^) : B: , 5 #‰

Different types of predictor variables:

ï ordinary quantitative variables

ï indicator variables (AOV is a regression!)

3 treatments; means. (^) " ,. (^) # ,.$

1 if observation is from trt 1 0 otherwise

B (^) " œœ

1 if observation is from trt 2 0 otherwise

B (^) # œœ

Ea ] bœ " (^)! " " (^) " B (^) " " "# B#

. (^) " œ " (^)! """ . (^) # œ " (^)! ""# . (^) $ œ"!

ï nonlinear terms, e.g.

Ea ] bœ " (^)! " " (^) " B " "# B#

ï interactions, e.g.

Ea ] bœ " (^)! " " (^) " B (^) " " " (^) # B (^) # " "$ B B" #