An Introduction to Simple Linear Regression - Computer Processing Data | STAT 479, Exams of Statistics

Material Type: Exam; Class: CMPTR PROCESSG DATA; Subject: STATISTICS; University: Iowa State University; Term: Unknown 1989;

Typology: Exams

Pre 2010

Uploaded on 09/02/2009

koofers-user-51n
koofers-user-51n 🇺🇸

3

(1)

10 documents

1 / 10

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
An Introduction to Simple Linear Regression
The model:
yi=β0+β1xi+i, i =...,n (1)
β0is the intercept parameter; β1the slope parameter.
is a random variable (random error) with
E() = 0) and V ar(E) = σ2
This implies yis a random variable with
mean E(y) = β0+β1xand variance σ2
For statistical inference. it is also assumed that the
’s are a random sample from the normal distribution.
This implies that y-values observed at each x-value
are a random sample from a normal distribution with
the mean E(y) = µ(x) = β0+β1xand variance σ2.
Esimation: Minimize
X
i(yiˆyi)2=X
i(yiˆ
β0ˆ
β1xi)2
to obtain least squares estimates ˆ
β0and ˆ
β1
Regression Anova table:
Source df Sum of Mean F
Squares Square
Regression 1 SSReg MSReg=SSReg/1 MSReg/MSE
Error n2 SSE MSE=SSE/(n2)
Total n1 SSTot
pf3
pf4
pf5
pf8
pf9
pfa

Partial preview of the text

Download An Introduction to Simple Linear Regression - Computer Processing Data | STAT 479 and more Exams Statistics in PDF only on Docsity!

An Introduction to Simple Linear Regression

• The model:

yi = β 0 + β 1 xi + i, i =... , n (1)

β 0 is the intercept parameter; β 1 the slope parameter.

•  is a random variable (random error) with

E() = 0) and V ar(E) = σ

• This implies y is a random variable with

mean E(y) = β 0 + β 1 x and variance σ

• For statistical inference. it is also assumed that the

’s are a random sample from the normal distribution.

• This implies that y-values observed at each x-value

are a random sample from a normal distribution with

the mean E(y) = μ(x) = β 0 + β 1 x and variance σ

• Esimation: Minimize

i

(yi − yˆi)

i

(yi −

β 1 xi)

to obtain least squares estimates

β 0 and

• Regression Anova table:

Source df Sum of Mean F

Squares Square

Regression 1 SSReg MSReg=SSReg/1 MSReg/MSE

Error n − 2 SSE MSE=SSE/(n − 2)

Total n − 1 SSTot

• Statistical Inference:

Test H 0 : β 1 = 0 vs. Ha : β 1 6 = 0.

• Use F -ratio, Fc = MSReg/MSE.

Reject H 0 : if Fc > Fα, 1 ,n− 2

• Use

t =

where sˆ

is the standard error of

Reject H 0 : if tc > tα,n− 2

• A (1 − α)100% confidence interval for β 1 is:

β 1 ± tα/ 2 ,(n−2) × sˆ

• The Coefficient of Determination :

R

= SSReg/SSTot

Measures the proportion of variation in y explained

by using ˆy to predict y.

• Example: Lead content in trees near

highways:

Traffic Flow,x 8.3 8.3 12.1 12.1 17.0 17.0 17.0 24.3 24.3 24.3 33. Lead Content,y 227 312 362 521 640 539 728 945 738 759 1263

• Model used is:

yi = β 0 + β 1 xi + i, i =... , 11 (2)

where x =Traffic Flow y =Lead Content

See SAS Example D

• SSEPure=60,103.67 with 6 degrees of freedom is com-

puted as shown in the text. The lack of fit anova

table:

Source df Sum of Mean F

Squares Square

Lack of Fit 3 16,389.33 5,463.11 0.

Pure Error 6 60,103.67 10,017.

Total Error 9 76,

• SAS Example D2 demonstrates how SSEPure is

computed using proc anova by considering the x

variable as a classification variable. Use the SSE from

this analysis:

Error 6 60103.6667 10017.

• SAS Example D3 illustrates the use of the case

statistics:

– Cook’s D: measures the influence a data point will

have on the estimated parameters and/or overall

fit statistics;

– RStudent: externally studentized residuals; mea-

sures whether a case is to be declared a y-outlier.

– Hat Diag: measures high leverage point identifies

x-outliers

• The graph of the data and fitted line and the case sta-

tistics are reproduced are reproduced in the following

pages. See the text for a discussion on how these sta-

tistics can be used.

The REG Procedure

Dependent Variable: y

Output Statistics

Output Statistics 4 2 Cook’s Hat Diag Cov

Artificial Data Set 3 2

The REG Procedure Model: MODEL Dependent Variable: y

Output Statistics

Dependent Predicted Std Error Std Error Student Obs Variable Value Mean Predict Residual Residual Residual

Output Statistics

Cook’s Hat Diag Cov Obs -2-1 0 1 2 D RStudent H Ratio DFFITS

Artificial Data Set 4 2

The REG Procedure Model: MODEL Dependent Variable: y

Output Statistics

Dependent Predicted Std Error Std Error Student Obs Variable Value Mean Predict Residual Residual Residual

Output Statistics

Cook’s Hat Diag Cov Obs -2-1 0 1 2 D RStudent H Ratio DFFITS

The REG Procedure

Dependent Variable: y Lead Content Output Statistics Output Statistics

 - Artificial Data Set - Model: MODEL 
  • Obs Variable Value Mean Predict Residual Residual Residual Dependent Predicted Std Error Std Error Student - 1 7.0000 7.0982 0.2786 -0.0982 0.836 -0. - 2 8.2000 9.5164 0.4284 -1.3164 0.770 -1. - 3 8.0000 8.0655 0.2786 -0.0655 0.836 -0. - 4 8.3000 8.5491 0.3143 -0.2491 0.823 -0. - 5 10.0000 10.0000 0.4970 -2.05E-15 0.728 -28E- - 6 7.2000 6.1309 0.3662 1.0691 0.801 1. - 7 4.3000 5.1636 0.4970 -0.8636 0.728 -1. - 8 8.8000 7.5818 0.2657 1.2182 0.840 1. - 9 5.8000 6.6145 0.3143 -0.8145 0.823 -0.
    • 10 5.7000 5.6473 0.4284 0.0527 0.770 0.
    • 11 10.1000 9.0327 0.3662 1.0673 0.801 1.
      • 1 | | | 0.001 -0.1108 0.1000 1.4019 -0. Obs -2-1 0 1 2 D RStudent 3 H Ratio DFFITS
      • 2 | ***| | 0 452 − 1 9616 0.2364 0.7556 -1.
      • 3 | | | 0.000 -0.0739 0.1000 1.4043 -0.
      • 4 | | | 0.007 -0.2868 0.1273 1.4208 -0.
      • 5 | | | 0.000 -2.66E-15 0.3182 1.8563 -0.
      • 6 | |** | 0.186 1.4042 0.1727 0.9847 0.
      • 7 | **| | 0.329 -1.2186 0.3182 1.3205 -0.
      • 8 | |** | 0.105 1.5617 0.0909 0.8177 0.
      • 9 | *| | 0.071 -0.9883 0.1273 1.1518 -0.
    • 10 | | | 0.001 0.0646 0.2364 1.6556 0.
    • 11 | |** | 0.185 1.4012 0.1727 0.9863 0. - Prediction Intervals: Lead Content Data - Model: MODEL
  • Obs Variable Value Mean Predict 95% CL Mean Dependent Predicted Std Error - 1 227.0000 287.4844 45.4206 184.7359 390. - 2 312.0000 287.4844 45.4206 184.7359 390. - 3 362.0000 424.9830 35.3804 344.9470 505. - 4 521.0000 424.9830 35.3804 344.9470 505. - 5 640.0000 602.2839 28.0543 538.8206 665. - 6 539.0000 602.2839 28.0543 538.8206 665. - 7 728.0000 602.2839 28.0543 538.8206 665. - 8 945.0000 866.4260 36.1835 784.5731 948. - 9 738.0000 866.4260 36.1835 784.5731 948.
    • 10 759.0000 866.4260 36.1835 784.5731 948.
    • 11 1263 1203 63.8739
    • 12 348.9969 40.6376 257.0684 440.
    • 13 529.9162 29.9605 462.1408 597. - 1 54.9967 519.9721 -60. Obs 95% CL Predict 4 Residual - 2 54.9967 519.9721 24. - 3 201.6021 648.3640 -62. - 4 201.6021 648.3640 96. - 5 384.2911 820.2767 37. - 6 384.2911 820.2767 -63. - 7 384.2911 820.2767 125. - 8 642.3876 1090 78. - 9 642.3876 1090 -128. - 10 642.3876 1090 -107. - 11 949.2204 1457 60. - 12 121.0843 576. - 13 310.6292 749.