




Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Multiple regression analysis, a statistical method used to predict a dependent variable by analyzing the relationship between that variable and multiple independent variables. Using examples of irs audits and house appraisals, it discusses the idea of taking the mean of the dependent variable as a linear function of the predictor variables. The document also covers different types of predictor variables, including indicator variables, nonlinear terms, and interactions. It concludes with a discussion of matrix representations and inferences for multiple regression.
Typology: Study notes
1 / 8
This page cannot be seen from the preview
Don't miss anything!





Situation: more than one independent variable; want to predict ] from B (^) " , B (^) # , ..., B 5.
ex: ï IRS predicts the amount of money to be recovered in an audit using (among other variables) amt. of deduction for charitable gifts, amt. of real estate losses, etc.
ï House appraiser predicts sale price of a house based on sq. ft., # bedrooms, ave. sale price in neighborhood, etc.
Idea: mean of ] is taken to be a linear function of the predictor variables:
Ea ] bœ " (^)! ' " (^) " B (^) " ' " (^) # B â '# " 5 B 5
With just two predictor variables (not functionally dependent), this equation is a plane.
Model:
] μ normal ˆ^ " (^)! ' " (^) " B (^) " ' " (^) # B â '# " 5 B 5 , 5 #‰
Different types of predictor variables:
ï ordinary quantitative variables
ï indicator variables (AOV is a regression!)
3 treatments; means. (^) " ,. (^) # ,.$
1 if observation is from trt 1 B (^) " œœ0 otherwise
1 if observation is from trt 2 B (^) # œœ0 otherwise
Ea ] bœ " (^)! ' " (^) " B (^) " ' "# B#
. (^) " œ " (^)! '"" . (^) # œ " (^)! '"# . (^) $ œ"!
Matrices:
X œ
B B â B B B â B ã ã ã â ã B B â B
"" "# " #" ## #
8" 8# (^85) a8‚ 5' a 1 bb
( design matrix )
y œ œ
ã C
s
s s s ã s
"
(^8) 8‚
! "
(^5) 5' ‚
a b a a b b
1 1 1
Matrix representations:
y s œ X" s predicted values
e œ Š y "X" s^ ‹ residuals
SS residualsa b œ (^) Š y " X " s^ (^) ‹ Š y "X" s‹
w
sum of squared errors (minimized at " s)
"" "# #" ## $" $# %" %#
! "
"
$ %
! " "" # "# ! " #" # ## ! " $" # $# ! " %" # %#
"" #" $" %" "# ## $# %#
"" "# #" ## $" $# %" %#
3" 3# 3" (^) 3"# 3" 3# 3# 3" 3# (^) 3##
The ìAOV tableî
source of variation SS df MS
regression SS regressiona b 5 SS regressiona^5 b
error SS residuala b 8 2 5 2 1 SS residual 8252 a^1 b
total SS totala b 8 2 1
SS regressiona b œ Š X " s^ 2 18 Jy ‹ Š X " s 2 81 Jy ‹
w
SS residuala b œ Š y " X " s^ ‹ Š y "X" s‹
w
SS totala b œ ˆ^ y 2 18 Jy ‰ ˆ^ y 2 18 Jy ‰
w
where J is an 8 ‚ 8matrix of 1's
Test of the model vs the mean
H :! "" œ "# œ â œ " 5 œ 0 (just the overall mean is used for predicting ])
H :a Á (all predictor variables are included in the model)
Test statistic (from LR): 0 œ (^) SS residualaSS regression a^ b aÎ 8252" b^ Î5 b
Rejection region: reject H (^)! if (^0 0)! where the F distribution has 5 and 8 2 5 21 df.
Coefficient of determination
< #^ œ SS regression^ aSS total^ a b b^ œ 12 SS residualSS totala^ a bb
(proportional reduction in the prediction error attained by using the multiple regression model instead of the overall mean)