Multiple Regression: Predicting Dependent Variables with Multiple Independent Variables - , Study notes of Statistics

Multiple regression analysis, a statistical method used to predict a dependent variable by analyzing the relationship between that variable and multiple independent variables. Using examples of irs audits and house appraisals, it discusses the idea of taking the mean of the dependent variable as a linear function of the predictor variables. The document also covers different types of predictor variables, including indicator variables, nonlinear terms, and interactions. It concludes with a discussion of matrix representations and inferences for multiple regression.

Typology: Study notes

Pre 2010

Uploaded on 08/19/2009

koofers-user-sx7
koofers-user-sx7 🇺🇸

10 documents

1 / 8

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Multiple regression
Situation: more than one independent variable; want to
predict from , , ..., .]BBB
"# 5
ex: • IRS predicts the amount of money to be recovered
in an audit using (among other variables) amt.
of deduction for charitable gifts, amt. of real
estate losses, etc.
• House appraiser predicts sale price of a house based
on sq. ft., # bedrooms, ave. sale price in
neighborhood, etc.
Idea: mean of is taken to be a linear function of the]
predictor variables:
Eab]œ 'B'Bâ'B"" " "
!""## 55
With just two predictor variables (not functionally
dependent), this equation is a .
plane
pf3
pf4
pf5
pf8

Partial preview of the text

Download Multiple Regression: Predicting Dependent Variables with Multiple Independent Variables - and more Study notes Statistics in PDF only on Docsity!

Multiple regression

Situation: more than one independent variable; want to predict ] from B (^) " , B (^) # , ..., B 5.

ex: ï IRS predicts the amount of money to be recovered in an audit using (among other variables) amt. of deduction for charitable gifts, amt. of real estate losses, etc.

ï House appraiser predicts sale price of a house based on sq. ft., # bedrooms, ave. sale price in neighborhood, etc.

Idea: mean of ] is taken to be a linear function of the predictor variables:

Ea ] bœ " (^)! ' " (^) " B (^) " ' " (^) # B â '# " 5 B 5

With just two predictor variables (not functionally dependent), this equation is a plane.

Model:

] μ normal ˆ^ " (^)! ' " (^) " B (^) " ' " (^) # B â '# " 5 B 5 , 5 #‰

Different types of predictor variables:

ï ordinary quantitative variables

ï indicator variables (AOV is a regression!)

3 treatments; means. (^) " ,. (^) # ,.$

1 if observation is from trt 1 B (^) " œœ0 otherwise

1 if observation is from trt 2 B (^) # œœ0 otherwise

Ea ] bœ " (^)! ' " (^) " B (^) " ' "# B#

. (^) " œ " (^)! '"" . (^) # œ " (^)! '"# . (^) $ œ"!

Matrices:

X œ

B B â B B B â B ã ã ã â ã B B â B

Ô ×

Ö Ù

Ö Ù

Õ Ø

"" "# " #" ## #

8" 8# (^85) a8‚ 5' a 1 bb

( design matrix )

y œ œ

C

C

ã C

s

s s s ã s

Ô ×

Ö Ù Ö Ù

Ö Ù Ö Ù

Õ Ø

Ô ×

Ö Ù

Ö Ù

Ö Ù

Ö Ù

Õ Ø

"

(^8) 8‚

! "

(^5) 5' ‚

a b a a b b

1 1 1

" ML, LS

Matrix representations:

y s œ X" s predicted values

e œ Š y "X" s^ ‹ residuals

SS residualsa b œ (^) Š y " X " s^ (^) ‹ Š y "X" s‹

w

sum of squared errors (minimized at " s)

X œ y

B B

B B

B B

B B

Ô ×

Ö Ù

Ö Ù

Õ Ø

"" "# #" ## $" $# %" %#

C

C

C

C

Ô ×

Õ Ø

Ô ×

Ö Ù

Ö Ù

Õ Ø

! "

"

$ %

X" œ

 B  B

 B  B

 B  B

 B  B

Ô ×

Ö Ù

Ö Ù

Õ Ø

! " "" # "# ! " #" # ## ! " $" # $# ! " %" # %#

X Xw^ œ B^ B^ B^ B

B B B B

Ô ×

Õ Ø

"" #" $" %" "# ## $# %#

Ô ×

Ö Ù

Ö Ù

Õ Ø

B B

B B

B B

B B

"" "# #" ## $" $# %" %#

8 B B

B B B B

B B B B

Ô ×

Õ Ø

D D

D D D

D D D

3" 3# 3" (^) 3"# 3" 3# 3# 3" 3# (^) 3##

 X Xw^  "^  X Xw  œ I œ

Ô ×

Õ Ø

Inferences for multiple regression

The ìAOV tableî

source of variation SS df MS

regression SS regressiona b 5 SS regressiona^5 b

error SS residuala b 8 2 5 2 1 SS residual 8252 a^1 b

total SS totala b 8 2 1

SS regressiona b œ Š X " s^ 2 18 Jy ‹ Š X " s 2 81 Jy

w

SS residuala b œ Š y " X " s^ ‹ Š y "X" s‹

w

SS totala b œ ˆ^ y 2 18 Jy ‰ ˆ^ y 2 18 Jy

w

where J is an 8 ‚ 8matrix of 1's

Test of the model vs the mean

H :! "" œ "# œ â œ " 5 œ 0 (just the overall mean is used for predicting ])

H :a Á (all predictor variables are included in the model)

Test statistic (from LR): 0 œ (^) SS residualaSS regression a^ b aÎ 8252" b^ Î5 b

Rejection region: reject H (^)! if (^0 0)! where the F distribution has 5 and 8 2 5 21 df.

Coefficient of determination

< #^ œ SS regression^ aSS total^ a b b^ œ 12 SS residualSS totala^ a bb

(proportional reduction in the prediction error attained by using the multiple regression model instead of the overall mean)