Download Lecture Notes on Linear Models - Linear Statistics Model | STAT 551 and more Study notes Statistics in PDF only on Docsity!
Examples of Linear Models
1. Ordinary Linear Regression [ aka “Simple” Liner Regression.]
Study of # of car trips to office buidings as a function of office space of the building. (Suburban
and semi-urban mid-Atlantic areas.) [Goal: Learn how to predict Y for given values of x .]
Statistical Model includes :
(a) E Y ( (^) i (^) ) = β 0 + xi β 1 , or Y = X β, where β =( β 0 ,β 1 )
(b) Yi independent
(c) Var( Yi ) constant.
(a) - (c) are often summarized as
(*) ( )
2 Yi = β 0 + β 1 xi + σ ε i (^) : ε i ∼ independent , with Var ε i = 1.
The Model often also includes more precisely specified assumptions about the distribution of Yi ,
most usually
(d) (*) and (^) i ( 0,1) ind
ε ∼ N.
[The vector-matrix form for part (a) of this model is
E (^) ( Y (^) )= X β.
In writing this general vector-matrix form for linear models we customarily write β as a vector.
Thus the usual way of writing the coordinates of β is
1
2
β β β
. There is one embarrassing
feature to this representation. In the model (a) we have
0
1
β β β
. Thus we have created a
notational monster in which
1 0
2 1
β β
β β
. This doesn’t seem to bother the authors of our text
- R & D - nor most other authors. It won’t bother you either if you don’t let it do so.
P.S. A better way to proceed would have been to write (*) with a different letter – e.g.,
(**) ( )
2 Yi = α 0 + α 1 xi + σ ε (^) i : ε i ∼ independent , with Var ε i = 1. Then it would have been true
that
1 0
2 1
β α
β α
, which is OK.]
Here are Y and X for the usual vector and matrix form:
Y (no. of AM car trips/day) x
- 99.00 1 60. i1 xi2=occup. sq ft
- 142.20 1 60.
- 176.80 1 65.
- 151.98 1 77.
- 148.13 1 78.
- 152.06 1 80.
- 145.89 1 97.
- 172.05 1 93.
- 225.35 1 102.
- 159.76 1 105.
- 114.49 1 107.
- 143.88 1 59.
- 166.06 1 59.
- 112.80 1 112.
- 159.14 1 109.
- 161.34 1 109.
- 150.36 1 109.
- 348.00 1 120.
- 172.92 1 124.
- 253.84 1 128.
- 211.09 1 130.
- 105.67 1 103.
- 171.00 1 150.
- 355.20 1 160.
- 248.78 1 162.
- 252.03 1 162.
- 224.40 1 165.
- 227.70 1 165.
- 200.10 1 174.
- 331.66 1 161.
- 133.46 1 175.
- 164.00 1 200.
- 362.45 1 198.
- 235.69 1 102.
- 352.24 1 200.
- 387.90 1 255.
- 320.00 1 256.
- 400.86 1 262.
- 435.10 1 263.
- 401.10 1 219.
- 449.55 1 333.
- 243.76 1 136.
- 532.00 1 350.
- 318.78 1 414.
- 606.51 1 427.
- 460.01 1 479.
- 618.24 1 471.
- 951.83 1 509.
- 1119.96 1 549.
- 1131.29 1 440.
- 1419.48 1 586.
Here’s an alternate (different) analysis. We’ll later discuss this one, and others.
Bivariate Fit of AM Trips By Occup. Sq. Ft. (1000)
Weight: "weight"
0
250
500
750
1000
1250
1500
AM Trips
0 100 200 300 400 500 600
Occup. Sq. Ft. (1000)
Linear Fit
AM Trips = 25.948 + 1.4789 Occup. Sq. Ft. (1000)
Summary of Fit
RSquare 0. RSquare Adj 0. Root Mean Square Error 0. Mean of Response 68. Observations (or Sum Wgts) 0.
Analysis of Variance
Source DF Sum of Squares Mean Square F Ratio Model 1 84.85679 84.8568 268. Error 59 18.61377 0.3155 Prob > F C. Total 60 103.47056 <.
Parameter Estimates
Term Estimate Std Error t Ratio Prob>|t| Intercept 25.9483 4.259315 6.09 <. Occup. Sq. Ft. (1000) 1.4789008 0.090175 16.40 <.
2. Multiple Linear Regression :
The data involves scores on a statewide [Texas] student proficiency exam of math and English.
The ‘independent’ variables in the data set are the % by school passing the math test, the
English test, and both tests. (Not all students take both tests.) There possible predictor variables
[co-variates] measure either the demographic character of the school district [These are marked
with a (*)] or features of the school organization and financing.
avgteacher_salary, avgclass_size, avgteacher_experience, pctltdenglish (*)
pctecondisadv(*), totalenrollment [in the school], grade3enrollment [in the school],
pct_special ed(), pct_gifted(), pct_black(), pct_hispanic(),
perpupil expend.
After preliminary analysis that we’ll examine later I decided that most of the differences in the
math score could be “explained” by 4 of the independent variables. This yields a model of the
type
E Y ( (^) i (^) ) = β 0 + β 1 x 1 (^) i + β 2 x 2 (^) i + β 3 x 3 (^) i +β 4 x 4 i
with the x j as below:
Summary of Fit
RSquare 0. Root Mean Square Error 12. Mean of Response 82. Observations (or Sum Wgts) 3421
Analysis of Variance
Source DF Sum of Squares Mean Square F Ratio Model 4 146753 36688.5 250. Error 3416 500621 146.6 Prob > F C. Total 3420 647375 <.
Parameter Estimates
Term Estimate Std Error t Ratio Prob>|t| Intercept 81.395 2.4681 32.98 <. avgteacher_salary 0.000253 0.000074 3.41 0. avgteacher_experience 0.3555 0.08062 4.41 <. pct_econdisadv -0.2014 0.00729 -27.63 <. pct_black -0.1301 0.01417 -9.18 <.
NOTE that two of these “explanatory” variables are demographic and two relate
to the character of the school.
Analysis of Variance
Source DF Sum of Squares Mean Square F Ratio Prob > F Day of Week 4 1.42813 0.357033 2.3259 0. Error 1266 194.33589 0. C. Total 1270 195.
Means for Oneway Anova
Level Number Mean Std Error Lower 95% Upper 95% FRI 246 2.08801 0.02498 2.0390 2. MON 278 2.10743 0.02350 2.0613 2. THU 268 2.03210 0.02393 1.9852 2. TUE 253 2.03307 0.02463 1.9847 2. WED 226 2.10219 0.02606 2.0511 2. Std Error uses a pooled estimate of error variance
Two (and higher) way analyses :
For examining the joint effect of day and server one could use an additive model of the
form
E Y ( ijk ) = μ +τ i + β j , i = 1,... , I j = 1,..., J , k =1,..., Kij
(In principle, some values of Kij could be 0, with the obvious interpretation.). OR one could use
an additive model with interactions of the form
E Y ( ijk ) = μ + τ i + β j + γ ij , i = 1,... , I j = 1,..., J , k = 1,..., Kij.
Later we’ll discuss some alternative ways of modeling data such as this, involving “random-
effects” models.
Here is some output from the model with interaction:
Response Log(SerTime)
Summary of Fit
RSquare 0. Root Mean Square Error 0. Mean of Response 2. Observations (or Sum Wgts) 1271
Analysis of Variance
Source DF Sum of Squares Mean Square F Ratio Model 79 17.12885 0.216821 1. Error 1191 178.63517 0.149988 Prob > F C. Total 1270 195.76402 0.
Effect Tests
Source Nparm DF Sum of Squares F Ratio Prob > F Day of Week 4 4 1.035081 1.7253 0. Server ID 15 15 3.585552 1.5937 0. Server ID*Day of Week 60 60 12.443047 1.3827 0.