



Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
The concept of linear regression, a statistical method used to model the linear relationship between two quantitative variables and predict one variable based on the value of another. It covers the basic idea, model equation, statistical inferences, and common applications. The document also discusses confidence intervals, hypothesis tests, and model evaluation.
Typology: Exams
1 / 7
This page cannot be seen from the preview
Don't miss anything!




Purpose: model linear dependence of two quantitative variables & predict one variable from the value of the other.
ex. Old Faithful, MLB, Hanford/Columbia River cancer study, etc.
Basic idea: start with a normal distribution for ]; allow the mean of ] to depend linearly on the value of a fixed, known covariate , B.
Model:
] μ normal ˆ^ " (^)! $ " (^) " B, 5 #‰
In this model, E a ] b œ. œ " (^)! $ " (^) " B, and V a ] bœ 5 #. Another (equivalent) way of writing the model:
] œ " (^)! $ " (^) "B $%
where % μ normal 0,a 5 #b.
Data: each observation is an ordered pair, 8 observations in all, written as a B (^) " , C (^) " b a, B (^) # , C (^) # b, ..., a B 8 , C 8 b. Commonly depicted in a scatterplot.
Statistical inferences:
ï Point estimates of unknown parameters " (^)! , "", and 5 # ï Confidence intervals ï Hypothesis tests ï Estimate & CI for. œ " (^)! $ ""Bat a particular value of B ï Prediction of a new value of ] atB ï Model evaluation ï Matrix representation
Pdf for ] is that of a normal distribution:
0 Ca b œ / È
The ML estimate of 5 # is the average squared departure of the C 3 's from their estimated means:
5 s œ C - " - " B 8 ML^ # s^ s 3œ"
8 (^3)! " 3
Usually the unbiased estimate of 5 # is used:
5 s œ C - " - " B 8 -
3œ"
8 (^3)! " 3
Note: the term
SS residuala b œ " Š C - s^ - s B ‹ 3œ"
8 (^3)! " 3
" "
is the ìresidual sum of squaresî (squared errors left over after model is fitted). The ìregression sum of squaresî is
SS regressiona b œ " Š s^ +s^ B - C-‹ 3œ"
8 ! " 3
" "
or the squared departures of the values predicted by the regression model from the grand mean of the C 3 's. As in AOV, these sum to the ìtotal sum of squaresî in the C 3 's:
SS totala b œ "a C - C-b 3œ"
8 3
SS totala b œ SS regressiona b $SS residuala b
Confidence intervals
Under repeated sampling (at the same values of the B 3 's),
the parameter estimates "s (^)! and "s"have normal distributions, with the following means and variances:
E (^) Š "s (^)! ‹œ"! (unbiased)
VŠ "s (^)! ‹ œ (^5) s#^ œ 5 #Š 8 $ WB ‹
"!
BB
1
estimate with 5 s # EŠ "s (^) " ‹œ""
VŠ "s (^) " ‹œ (^5) "s# œ W^5 "
BB estimate with 5 s #
100 1a - !b% CIs based on Student's t distribution with 8 - 2 df:
s - „ > (^)! Ês Š 1 $ ‹
BB
" (^) " : "s^ " „ >!Î# É^ Ws^5
BB
Hypothesis test for "":
H :! "" œ - (known constant)
H :a ""
Û ß Ü à
> œ "
s "-- É (^) WBBs^5 #
Reject H (^)! if where df 2 Î#
Û ß Ü k k à
œ 8 -
! ! !
Note: for testing H :! "" œ 0 (model has no predictive value) vs. H :a "" Á0 (model has some predictive value), one can show that
> #^ œ 0 œ SS residual^ SS regressiona a^ b aÎ 8-# bÎ^1 b
and that this test statistic has an F 1,a 8 - 2 bdistribution under the null hypothesis. One would reject H (^)! if (^0 0) !. This test statistic is usually printed by computer regression packages.