



Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
The concept of the coefficient of determination (r-squared) as a measure of the quality of fit for a linear regression model. It also covers correlation as a measure of linear association between two random variables. Formulas, examples, and hypotheses tests for both concepts.
Typology: Exams
1 / 6
This page cannot be seen from the preview
Don't miss anything!




The quality of fit of a linear regression model can be measured by the coefficient of determination. Recall that
SS totala b œ SS regressiona b "SS residuala b
The coefficient of determination, universally called ì < #î, is
< #^ œ œ %
SS regression SS residual SS total SS total
a b a b a b a b
It is the proportion of total variability in the C 3 's that is described or accounted for by the regression model. The value of < #^ is between 0 and 1; if < # œ 1, all the data points lie on a line.
Interestingly, the likelihood ratio statistic for testing H :! " (^) " œ 0 vs H :a "" Á0 can be written in terms of the ratio of variances, or the t statistic, or < #:
Ps Î 8 % " Ps^
œ œ
s s Î 8 % #
! (^)!
%8Î# (^) %8Î#
a (^) a
a b a b a b a b
SS total SS residual
œ " œ % <
” (^) a8 % " b •
Correlation
Correlation is a measure of linear association between two random variables. If \ and ] are random variables, then the correlation between them is a constant defined by
\ ] \ ]
œ
Ec a\ % ba] % bd
The value of the correlation is bounded between %1 and " 1. The expectation in the numerator is called the covariance of \ and ]; it is real-valued & unbounded (negative or positive).
A correlation (or covariance) of zero does not imply that the random variables are independent. Exception: if and ] have a bivariate normal distribution, then (^3) ] œ 0 implies independence.
Model: \ and ] have a bivariate normal distribution with means. (^) \ ,. (^) ] , variances (^5) # , (^5) ]#, and correlation (^3) ]. Pdf (joint) is a bell-shaped, elongated dome.
ex: ï height and weight ï mother's height and daughter's height ï SAT/ACT score and college GPA
Data: a B (^) " , C (^) " b a, B (^) # , C (^) # b, ..., a B 8 ,C 8 b
Situation: more than one independent variable; want to predict ] from B (^) " , B (^) # , ..., B:.
ex: ï IRS predicts the amount of money to be recovered in an audit using (among other variables) amt. of deduction for charitable gifts, amt. of real estate losses, etc.
ï House appraiser predicts sale price of a house based on sq. ft., # bedrooms, ave. sale price in neighborhood, etc.
Idea: mean of ] is taken to be a linear function of the predictor variables:
Ea ] bœ " (^)! " " (^) " B (^) " " " (^) # B â "# ": B:
With just two predictor variables (not functionally dependent), this equation is a plane.
Model:
] μ normal ˆ^ " (^)! " " (^) " B (^) " " " (^) # B â "# " (^) : B: , 5 #‰
Different types of predictor variables:
ï ordinary quantitative variables
ï indicator variables (AOV is a regression!)
3 treatments; means. (^) " ,. (^) # ,.$
1 if observation is from trt 1 0 otherwise
B (^) " œœ
1 if observation is from trt 2 0 otherwise
B (^) # œœ
Ea ] bœ " (^)! " " (^) " B (^) " " "# B#
. (^) " œ " (^)! """ . (^) # œ " (^)! ""# . (^) $ œ"!
ï nonlinear terms, e.g.
Ea ] bœ " (^)! " " (^) " B " "# B#
ï interactions, e.g.
Ea ] bœ " (^)! " " (^) " B (^) " " " (^) # B (^) # " "$ B B" #