Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Lecture Notes on Linear Models - Linear Statistics Model | STAT 551, Study notes of Statistics

University of Pennsylvania (UPenn)Statistics

Prof. L. Brown

Material Type: Notes; Professor: Brown; Class: INTRO TO LINEAR STAT MOD; Subject: Statistics; University: University of Pennsylvania; Term: Unknown 1989;

Typology: Study notes

Pre 2010

Uploaded on 03/28/2010

koofers-user-kqf 🇺🇸

10 documents

1 / 8

This page cannot be seen from the preview

Don't miss anything!

1

Examples of Linear Models

1. Ordinary Linear Regression [aka “Simple” Liner Regression.]

Study of # of car trips to office buidings as a function of office space of the building. (Suburban

and semi-urban mid-Atlantic areas.) [Goal: Learn how to predict Y for given values of x.]

Statistical Model includes:

(a)

()

01ii

EY x

β

=+ , or

(

)

01

, where ,YX

β

βββ

==

(b) Yi independent

(c) Var(Yi) constant.

(a) - (c) are often summarized as

(*)

(

)

2

01 : , with 1

iiii i

Y x independent Var

ββ σεε ε

=+ + =∼.

The Model often also includes more precisely specified assumptions about the distribution of Yi,

most usually

(d) (*) and

()

0,1

iind N

ε

∼.

[The vector-matrix form for part (a) of this model is

()

E

YX

β

=.

In writing this general vector-matrix form for linear models we customarily write

β

as a vector.

Thus the usual way of writing the coordinates of

β

is 1

2

β

⎛⎞

=⎜⎟

⎝⎠

. There is one embarrassing

feature to this representation. In the model (a) we have 0

1

β

⎛⎞

=⎜⎟

⎝⎠

. Thus we have created a

notational monster in which 10

21

""

β

⎛⎞ ⎛⎞

=

⎜⎟ ⎜⎟

⎝⎠ ⎝⎠

. This doesn’t seem to bother the authors of our text

- R&D - nor most other authors. It won’t bother you either if you don’t let it do so.

P.S. A better way to proceed would have been to write (*) with a different letter – e.g.,

(**)

(

)

2

01 : , with 1

iiii i

Y x independent Var

αα σεε ε

=+ + =∼. Then it would have been true

that 10

21

β

α

β

α

⎞⎞

⎛⎛

=

⎟⎟

⎜⎜

⎝⎝

⎠⎠

, which is OK.]

Discover Study notes of Statistics University of Pennsylvania (UPenn)

Partial preview of the text

Download Lecture Notes on Linear Models - Linear Statistics Model | STAT 551 and more Study notes Statistics in PDF only on Docsity!

Examples of Linear Models

1. Ordinary Linear Regression [ aka “Simple” Liner Regression.]

Study of # of car trips to office buidings as a function of office space of the building. (Suburban

and semi-urban mid-Atlantic areas.) [Goal: Learn how to predict Y for given values of x .]

Statistical Model includes :

(a) E Y ( (^) i (^) ) = β 0 + xi β 1 , or Y = X β, where β =( β 0 ,β 1 )

(b) Yi independent

(c) Var( Yi ) constant.

(a) - (c) are often summarized as

(*) ( )

2 Yi = β 0 + β 1 xi + σ ε i (^) : ε i ∼ independent , with Var ε i = 1.

The Model often also includes more precisely specified assumptions about the distribution of Yi ,

most usually

(d) (*) and (^) i ( 0,1) ind

ε ∼ N.

[The vector-matrix form for part (a) of this model is

E (^) ( Y (^) )= X β.

In writing this general vector-matrix form for linear models we customarily write β as a vector.

Thus the usual way of writing the coordinates of β is

1

2

β β β

. There is one embarrassing

feature to this representation. In the model (a) we have

0

1

β β β

. Thus we have created a

notational monster in which

1 0

2 1

β β

. This doesn’t seem to bother the authors of our text

R & D - nor most other authors. It won’t bother you either if you don’t let it do so.

P.S. A better way to proceed would have been to write (*) with a different letter – e.g.,

(**) ( )

2 Yi = α 0 + α 1 xi + σ ε (^) i : ε i ∼ independent , with Var ε i = 1. Then it would have been true

that

1 0

2 1

β α

, which is OK.]

Here are Y and X for the usual vector and matrix form:

Y (no. of AM car trips/day) x

 - 99.00 1 60. i1 xi2=occup. sq ft

142.20 1 60.
176.80 1 65.
151.98 1 77.
148.13 1 78.
- 67.03 1 79.
152.06 1 80.
145.89 1 97.
172.05 1 93.
225.35 1 102.
159.76 1 105.
114.49 1 107.
143.88 1 59.
166.06 1 59.
112.80 1 112.
159.14 1 109.
161.34 1 109.
150.36 1 109.
348.00 1 120.
172.92 1 124.
253.84 1 128.
211.09 1 130.
105.67 1 103.
171.00 1 150.
355.20 1 160.
248.78 1 162.
252.03 1 162.
224.40 1 165.
227.70 1 165.
200.10 1 174.
331.66 1 161.
133.46 1 175.
164.00 1 200.
362.45 1 198.
235.69 1 102.
352.24 1 200.
387.90 1 255.
320.00 1 256.
400.86 1 262.
435.10 1 263.
401.10 1 219.
449.55 1 333.
243.76 1 136.
532.00 1 350.
318.78 1 414.
606.51 1 427.
460.01 1 479.
618.24 1 471.
951.83 1 509.
1119.96 1 549.
1131.29 1 440.
1419.48 1 586.

Here’s an alternate (different) analysis. We’ll later discuss this one, and others.

Bivariate Fit of AM Trips By Occup. Sq. Ft. (1000)

Weight: "weight"

0

250

500

750

1000

1250

1500

AM Trips

0 100 200 300 400 500 600

Occup. Sq. Ft. (1000)

Linear Fit

AM Trips = 25.948 + 1.4789 Occup. Sq. Ft. (1000)

Summary of Fit

RSquare 0. RSquare Adj 0. Root Mean Square Error 0. Mean of Response 68. Observations (or Sum Wgts) 0.

Analysis of Variance

Source DF Sum of Squares Mean Square F Ratio Model 1 84.85679 84.8568 268. Error 59 18.61377 0.3155 Prob > F C. Total 60 103.47056 <.

Parameter Estimates

Term Estimate Std Error t Ratio Prob>|t| Intercept 25.9483 4.259315 6.09 <. Occup. Sq. Ft. (1000) 1.4789008 0.090175 16.40 <.

2. Multiple Linear Regression :

The data involves scores on a statewide [Texas] student proficiency exam of math and English.

The ‘independent’ variables in the data set are the % by school passing the math test, the

English test, and both tests. (Not all students take both tests.) There possible predictor variables

[co-variates] measure either the demographic character of the school district [These are marked

with a (*)] or features of the school organization and financing.

avgteacher_salary, avgclass_size, avgteacher_experience, pctltdenglish (*)

pctecondisadv(*), totalenrollment [in the school], grade3enrollment [in the school],

pct_special ed(), pct_gifted(), pct_black(), pct_hispanic(),

perpupil expend.

After preliminary analysis that we’ll examine later I decided that most of the differences in the

math score could be “explained” by 4 of the independent variables. This yields a model of the

type

E Y ( (^) i (^) ) = β 0 + β 1 x 1 (^) i + β 2 x 2 (^) i + β 3 x 3 (^) i +β 4 x 4 i

with the x j as below:

Summary of Fit

RSquare 0. Root Mean Square Error 12. Mean of Response 82. Observations (or Sum Wgts) 3421

Analysis of Variance

Source DF Sum of Squares Mean Square F Ratio Model 4 146753 36688.5 250. Error 3416 500621 146.6 Prob > F C. Total 3420 647375 <.

Parameter Estimates

Term Estimate Std Error t Ratio Prob>|t| Intercept 81.395 2.4681 32.98 <. avgteacher_salary 0.000253 0.000074 3.41 0. avgteacher_experience 0.3555 0.08062 4.41 <. pct_econdisadv -0.2014 0.00729 -27.63 <. pct_black -0.1301 0.01417 -9.18 <.

NOTE that two of these “explanatory” variables are demographic and two relate

to the character of the school.

Analysis of Variance

Source DF Sum of Squares Mean Square F Ratio Prob > F Day of Week 4 1.42813 0.357033 2.3259 0. Error 1266 194.33589 0. C. Total 1270 195.

Means for Oneway Anova

Level Number Mean Std Error Lower 95% Upper 95% FRI 246 2.08801 0.02498 2.0390 2. MON 278 2.10743 0.02350 2.0613 2. THU 268 2.03210 0.02393 1.9852 2. TUE 253 2.03307 0.02463 1.9847 2. WED 226 2.10219 0.02606 2.0511 2. Std Error uses a pooled estimate of error variance

Two (and higher) way analyses :

For examining the joint effect of day and server one could use an additive model of the

form

E Y ( ijk ) = μ +τ i + β j , i = 1,... , I j = 1,..., J , k =1,..., Kij

(In principle, some values of Kij could be 0, with the obvious interpretation.). OR one could use

an additive model with interactions of the form

E Y ( ijk ) = μ + τ i + β j + γ ij , i = 1,... , I j = 1,..., J , k = 1,..., Kij.

Later we’ll discuss some alternative ways of modeling data such as this, involving “random-

effects” models.

Here is some output from the model with interaction:

Response Log(SerTime)

Summary of Fit

RSquare 0. Root Mean Square Error 0. Mean of Response 2. Observations (or Sum Wgts) 1271

Analysis of Variance

Source DF Sum of Squares Mean Square F Ratio Model 79 17.12885 0.216821 1. Error 1191 178.63517 0.149988 Prob > F C. Total 1270 195.76402 0.

Effect Tests

Source Nparm DF Sum of Squares F Ratio Prob > F Day of Week 4 4 1.035081 1.7253 0. Server ID 15 15 3.585552 1.5937 0. Server ID*Day of Week 60 60 12.443047 1.3827 0.

Lecture Notes on Linear Models - Linear Statistics Model | STAT 551, Study notes of Statistics

Related documents

Partial preview of the text

Download Lecture Notes on Linear Models - Linear Statistics Model | STAT 551 and more Study notes Statistics in PDF only on Docsity!

Examples of Linear Models

1. Ordinary Linear Regression [ aka “Simple” Liner Regression.]

Study of # of car trips to office buidings as a function of office space of the building. (Suburban

and semi-urban mid-Atlantic areas.) [Goal: Learn how to predict Y for given values of x .]

Statistical Model includes :

(b) Yi independent

(c) Var( Yi ) constant.

(a) - (c) are often summarized as

The Model often also includes more precisely specified assumptions about the distribution of Yi ,

most usually

[The vector-matrix form for part (a) of this model is

In writing this general vector-matrix form for linear models we customarily write β as a vector.

Thus the usual way of writing the coordinates of β is

. There is one embarrassing

feature to this representation. In the model (a) we have

. Thus we have created a

notational monster in which

. This doesn’t seem to bother the authors of our text

P.S. A better way to proceed would have been to write (*) with a different letter – e.g.,

that

, which is OK.]

Here are Y and X for the usual vector and matrix form:

Here’s an alternate (different) analysis. We’ll later discuss this one, and others.

Bivariate Fit of AM Trips By Occup. Sq. Ft. (1000)

Linear Fit

Summary of Fit

Analysis of Variance

Parameter Estimates

2. Multiple Linear Regression :

The data involves scores on a statewide [Texas] student proficiency exam of math and English.

The ‘independent’ variables in the data set are the % by school passing the math test, the

English test, and both tests. (Not all students take both tests.) There possible predictor variables

[co-variates] measure either the demographic character of the school district [These are marked

with a (*)] or features of the school organization and financing.

avgteacher_salary, avgclass_size, avgteacher_experience, pctltdenglish (*)

pctecondisadv(*), totalenrollment [in the school], grade3enrollment [in the school],

pct_special ed(), pct_gifted(), pct_black(), pct_hispanic(),

perpupil expend.

After preliminary analysis that we’ll examine later I decided that most of the differences in the

math score could be “explained” by 4 of the independent variables. This yields a model of the

type

with the x j as below:

Summary of Fit

Analysis of Variance

Parameter Estimates

NOTE that two of these “explanatory” variables are demographic and two relate

to the character of the school.

Analysis of Variance

Means for Oneway Anova

Two (and higher) way analyses :

For examining the joint effect of day and server one could use an additive model of the

form

E Y ( ijk ) = μ +τ i + β j , i = 1,... , I j = 1,..., J , k =1,..., Kij

(In principle, some values of Kij could be 0, with the obvious interpretation.). OR one could use

an additive model with interactions of the form

E Y ( ijk ) = μ + τ i + β j + γ ij , i = 1,... , I j = 1,..., J , k = 1,..., Kij.

Later we’ll discuss some alternative ways of modeling data such as this, involving “random-

effects” models.

Here is some output from the model with interaction:

Response Log(SerTime)

Summary of Fit

Analysis of Variance

Effect Tests