Linear Regression: Modeling Linear Dependence and Predicting Quantitative Variables - Prof, Exams of Statistics

The concept of linear regression, a statistical method used to model the linear relationship between two quantitative variables and predict one variable based on the value of another. It covers the basic idea, model equation, statistical inferences, and common applications. The document also discusses confidence intervals, hypothesis tests, and model evaluation.

Typology: Exams

Pre 2010

Uploaded on 08/19/2009

koofers-user-m0t
koofers-user-m0t 🇺🇸

10 documents

1 / 7

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Linear regression
Purpose: model linear dependence of two quantitative
variables & predict one variable from the value of the
other.
ex. Old Faithful, MLB, Hanford/Columbia River cancer
study, etc.
Basic idea: start with a normal distribution for ; allow]
the mean of to depend linearly on the value of a fixed,]
known , .covariate B
Model:
$ Bnormal ,
ˆ‰
""5
!"
#
pf3
pf4
pf5

Partial preview of the text

Download Linear Regression: Modeling Linear Dependence and Predicting Quantitative Variables - Prof and more Exams Statistics in PDF only on Docsity!

Linear regression

Purpose: model linear dependence of two quantitative variables & predict one variable from the value of the other.

ex. Old Faithful, MLB, Hanford/Columbia River cancer study, etc.

Basic idea: start with a normal distribution for ]; allow the mean of ] to depend linearly on the value of a fixed, known covariate , B.

Model:

] μ normal ˆ^ " (^)! $ " (^) " B, 5 #‰

In this model, E a ] b œ. œ " (^)! $ " (^) " B, and V a ] bœ 5 #. Another (equivalent) way of writing the model:

] œ " (^)! $ " (^) "B $%

where % μ normal 0,a 5 #b.

Data: each observation is an ordered pair, 8 observations in all, written as a B (^) " , C (^) " b a, B (^) # , C (^) # b, ..., a B 8 , C 8 b. Commonly depicted in a scatterplot.

Statistical inferences:

ï Point estimates of unknown parameters " (^)! , "", and 5 # ï Confidence intervals ï Hypothesis tests ï Estimate & CI for. œ " (^)! $ ""Bat a particular value of B ï Prediction of a new value of ] atB ï Model evaluation ï Matrix representation

Pdf for ] is that of a normal distribution:

0 Ca b œ / È

  • c^ C-a^ "!^2 $^5 #^ ""B^ bd#

The ML estimate of 5 # is the average squared departure of the C 3 's from their estimated means:

5 s œ C - " - " B 8 ML^ # s^ s 3œ"

8 (^3)! " 3

Usually the unbiased estimate of 5 # is used:

5 s œ C - " - " B 8 -

s s

3œ"

8 (^3)! " 3

Note: the term

SS residuala b œ " Š C - s^ - s B ‹ 3œ"

8 (^3)! " 3

" "

is the ìresidual sum of squaresî (squared errors left over after model is fitted). The ìregression sum of squaresî is

SS regressiona b œ " Š s^ +s^ B - C-‹ 3œ"

8 ! " 3

" "

or the squared departures of the values predicted by the regression model from the grand mean of the C 3 's. As in AOV, these sum to the ìtotal sum of squaresî in the C 3 's:

SS totala b œ "a C - C-b 3œ"

8 3

SS totala b œ SS regressiona b $SS residuala b

Confidence intervals

Under repeated sampling (at the same values of the B 3 's),

the parameter estimates "s (^)! and "s"have normal distributions, with the following means and variances:

E (^) Š "s (^)! ‹œ"! (unbiased)

VŠ "s (^)! ‹ œ (^5) s#^ œ 5 #Š 8 $ WB ‹

"!

BB

1

estimate with 5 s # EŠ "s (^) " ‹œ""

VŠ "s (^) " ‹œ (^5) "s# œ W^5 "

BB estimate with 5 s #

100 1a - !b% CIs based on Student's t distribution with 8 - 2 df:

" ! : " ! Î# 5 # 8 WB

s - „ > (^)! Ês Š 1 $ ‹

BB

" (^) " : "s^ " „ >!Î# É^ Ws^5

BB

Hypothesis test for "":

H :! "" œ - (known constant)

H :a ""

Ú Þ

Û ß Ü à

Á

> œ "

s "-- É (^) WBBs^5 #

Reject H (^)! if where df 2 Î#

Ú Þ

Û ß Ü k k à

œ 8 -

! ! !

Note: for testing H :! "" œ 0 (model has no predictive value) vs. H :a "" Á0 (model has some predictive value), one can show that

> #^ œ 0 œ SS residual^ SS regressiona a^ b aÎ 8-# bÎ^1 b

and that this test statistic has an F 1,a 8 - 2 bdistribution under the null hypothesis. One would reject H (^)! if (^0 0) !. This test statistic is usually printed by computer regression packages.