Lecture Notes on Linear Models - Linear Statistics Model | STAT 551, Study notes of Statistics

Material Type: Notes; Professor: Brown; Class: INTRO TO LINEAR STAT MOD; Subject: Statistics; University: University of Pennsylvania; Term: Unknown 1989;

Typology: Study notes

Pre 2010

Uploaded on 03/28/2010

koofers-user-kqf
koofers-user-kqf 🇺🇸

10 documents

1 / 8

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1
Examples of Linear Models
1. Ordinary Linear Regression [aka “Simple” Liner Regression.]
Study of # of car trips to office buidings as a function of office space of the building. (Suburban
and semi-urban mid-Atlantic areas.) [Goal: Learn how to predict Y for given values of x.]
Statistical Model includes:
(a)
()
01ii
EY x
β
β
=+ , or
(
)
01
, where ,YX
β
βββ
==
(b) Yi independent
(c) Var(Yi) constant.
(a) - (c) are often summarized as
(*)
(
)
2
01 : , with 1
iiii i
Y x independent Var
ββ σεε ε
=+ + =.
The Model often also includes more precisely specified assumptions about the distribution of Yi,
most usually
(d) (*) and
()
0,1
iind N
ε
.
[The vector-matrix form for part (a) of this model is
()
E
YX
β
=.
In writing this general vector-matrix form for linear models we customarily write
β
as a vector.
Thus the usual way of writing the coordinates of
β
is 1
2
β
β
β
⎛⎞
=⎜⎟
⎝⎠
. There is one embarrassing
feature to this representation. In the model (a) we have 0
1
β
β
⎛⎞
=⎜⎟
⎝⎠
. Thus we have created a
notational monster in which 10
21
""
β
β
β
β
⎛⎞ ⎛⎞
=
⎜⎟ ⎜⎟
⎝⎠ ⎝⎠
. This doesn’t seem to bother the authors of our text
- R&D - nor most other authors. It won’t bother you either if you don’t let it do so.
P.S. A better way to proceed would have been to write (*) with a different letter – e.g.,
(**)
(
)
2
01 : , with 1
iiii i
Y x independent Var
αα σεε ε
=+ + =. Then it would have been true
that 10
21
β
α
β
α
⎞⎞
⎛⎛
=
⎟⎟
⎜⎜
⎝⎝
⎠⎠
, which is OK.]
pf3
pf4
pf5
pf8

Partial preview of the text

Download Lecture Notes on Linear Models - Linear Statistics Model | STAT 551 and more Study notes Statistics in PDF only on Docsity!

Examples of Linear Models

1. Ordinary Linear Regression [ aka “Simple” Liner Regression.]

Study of # of car trips to office buidings as a function of office space of the building. (Suburban

and semi-urban mid-Atlantic areas.) [Goal: Learn how to predict Y for given values of x .]

Statistical Model includes :

(a) E Y ( (^) i (^) ) = β 0 + xi β 1 , or Y = X β, where β =( β 0 ,β 1 )

(b) Yi independent

(c) Var( Yi ) constant.

(a) - (c) are often summarized as

(*) ( )

2 Yi = β 0 + β 1 xi + σ ε i (^) : ε iindependent , with Var ε i = 1.

The Model often also includes more precisely specified assumptions about the distribution of Yi ,

most usually

(d) (*) and (^) i ( 0,1) ind

ε ∼ N.

[The vector-matrix form for part (a) of this model is

E (^) ( Y (^) )= X β.

In writing this general vector-matrix form for linear models we customarily write β as a vector.

Thus the usual way of writing the coordinates of β is

1

2

β β β

. There is one embarrassing

feature to this representation. In the model (a) we have

0

1

β β β

. Thus we have created a

notational monster in which

1 0

2 1

β β

β β

. This doesn’t seem to bother the authors of our text

  • R & D - nor most other authors. It won’t bother you either if you don’t let it do so.

P.S. A better way to proceed would have been to write (*) with a different letter – e.g.,

(**) ( )

2 Yi = α 0 + α 1 xi + σ ε (^) i : ε iindependent , with Var ε i = 1. Then it would have been true

that

1 0

2 1

β α

β α

, which is OK.]

Here are Y and X for the usual vector and matrix form:

Y (no. of AM car trips/day) x

 - 99.00 1 60. i1 xi2=occup. sq ft 
  • 142.20 1 60.
  • 176.80 1 65.
  • 151.98 1 77.
  • 148.13 1 78.
    • 67.03 1 79.
  • 152.06 1 80.
  • 145.89 1 97.
  • 172.05 1 93.
  • 225.35 1 102.
  • 159.76 1 105.
  • 114.49 1 107.
  • 143.88 1 59.
  • 166.06 1 59.
  • 112.80 1 112.
  • 159.14 1 109.
  • 161.34 1 109.
  • 150.36 1 109.
  • 348.00 1 120.
  • 172.92 1 124.
  • 253.84 1 128.
  • 211.09 1 130.
  • 105.67 1 103.
  • 171.00 1 150.
  • 355.20 1 160.
  • 248.78 1 162.
  • 252.03 1 162.
  • 224.40 1 165.
  • 227.70 1 165.
  • 200.10 1 174.
  • 331.66 1 161.
  • 133.46 1 175.
  • 164.00 1 200.
  • 362.45 1 198.
  • 235.69 1 102.
  • 352.24 1 200.
  • 387.90 1 255.
  • 320.00 1 256.
  • 400.86 1 262.
  • 435.10 1 263.
  • 401.10 1 219.
  • 449.55 1 333.
  • 243.76 1 136.
  • 532.00 1 350.
  • 318.78 1 414.
  • 606.51 1 427.
  • 460.01 1 479.
  • 618.24 1 471.
  • 951.83 1 509.
  • 1119.96 1 549.
  • 1131.29 1 440.
  • 1419.48 1 586.

Here’s an alternate (different) analysis. We’ll later discuss this one, and others.

Bivariate Fit of AM Trips By Occup. Sq. Ft. (1000)

Weight: "weight"

0

250

500

750

1000

1250

1500

AM Trips

0 100 200 300 400 500 600

Occup. Sq. Ft. (1000)

Linear Fit

AM Trips = 25.948 + 1.4789 Occup. Sq. Ft. (1000)

Summary of Fit

RSquare 0. RSquare Adj 0. Root Mean Square Error 0. Mean of Response 68. Observations (or Sum Wgts) 0.

Analysis of Variance

Source DF Sum of Squares Mean Square F Ratio Model 1 84.85679 84.8568 268. Error 59 18.61377 0.3155 Prob > F C. Total 60 103.47056 <.

Parameter Estimates

Term Estimate Std Error t Ratio Prob>|t| Intercept 25.9483 4.259315 6.09 <. Occup. Sq. Ft. (1000) 1.4789008 0.090175 16.40 <.

2. Multiple Linear Regression :

The data involves scores on a statewide [Texas] student proficiency exam of math and English.

The ‘independent’ variables in the data set are the % by school passing the math test, the

English test, and both tests. (Not all students take both tests.) There possible predictor variables

[co-variates] measure either the demographic character of the school district [These are marked

with a (*)] or features of the school organization and financing.

avgteacher_salary, avgclass_size, avgteacher_experience, pctltdenglish (*)

pctecondisadv(*), totalenrollment [in the school], grade3enrollment [in the school],

pct_special ed(), pct_gifted(), pct_black(), pct_hispanic(),

perpupil expend.

After preliminary analysis that we’ll examine later I decided that most of the differences in the

math score could be “explained” by 4 of the independent variables. This yields a model of the

type

E Y ( (^) i (^) ) = β 0 + β 1 x 1 (^) i + β 2 x 2 (^) i + β 3 x 3 (^) i +β 4 x 4 i

with the x j as below:

Summary of Fit

RSquare 0. Root Mean Square Error 12. Mean of Response 82. Observations (or Sum Wgts) 3421

Analysis of Variance

Source DF Sum of Squares Mean Square F Ratio Model 4 146753 36688.5 250. Error 3416 500621 146.6 Prob > F C. Total 3420 647375 <.

Parameter Estimates

Term Estimate Std Error t Ratio Prob>|t| Intercept 81.395 2.4681 32.98 <. avgteacher_salary 0.000253 0.000074 3.41 0. avgteacher_experience 0.3555 0.08062 4.41 <. pct_econdisadv -0.2014 0.00729 -27.63 <. pct_black -0.1301 0.01417 -9.18 <.

NOTE that two of these “explanatory” variables are demographic and two relate

to the character of the school.

Analysis of Variance

Source DF Sum of Squares Mean Square F Ratio Prob > F Day of Week 4 1.42813 0.357033 2.3259 0. Error 1266 194.33589 0. C. Total 1270 195.

Means for Oneway Anova

Level Number Mean Std Error Lower 95% Upper 95% FRI 246 2.08801 0.02498 2.0390 2. MON 278 2.10743 0.02350 2.0613 2. THU 268 2.03210 0.02393 1.9852 2. TUE 253 2.03307 0.02463 1.9847 2. WED 226 2.10219 0.02606 2.0511 2. Std Error uses a pooled estimate of error variance

Two (and higher) way analyses :

For examining the joint effect of day and server one could use an additive model of the

form

E Y ( ijk ) = μ +τ i + β j , i = 1,... , I j = 1,..., J , k =1,..., Kij

(In principle, some values of Kij could be 0, with the obvious interpretation.). OR one could use

an additive model with interactions of the form

E Y ( ijk ) = μ + τ i + β j + γ ij , i = 1,... , I j = 1,..., J , k = 1,..., Kij.

Later we’ll discuss some alternative ways of modeling data such as this, involving “random-

effects” models.

Here is some output from the model with interaction:

Response Log(SerTime)

Summary of Fit

RSquare 0. Root Mean Square Error 0. Mean of Response 2. Observations (or Sum Wgts) 1271

Analysis of Variance

Source DF Sum of Squares Mean Square F Ratio Model 79 17.12885 0.216821 1. Error 1191 178.63517 0.149988 Prob > F C. Total 1270 195.76402 0.

Effect Tests

Source Nparm DF Sum of Squares F Ratio Prob > F Day of Week 4 4 1.035081 1.7253 0. Server ID 15 15 3.585552 1.5937 0. Server ID*Day of Week 60 60 12.443047 1.3827 0.