



Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Material Type: Exam; Class: Business & Econ Statistics II; Subject: Statistics; University: George Washington University; Term: Spring 2004;
Typology: Exams
1 / 6
This page cannot be seen from the preview
Don't miss anything!




Statistics 112 — Simple Linear Regression Fuel Consumption Example March 1, 2004 E. Bura
Fuel Consumption Case: reducing natural gas transmission fines.
In 1993, the natural gas industry was deregulated. In consequence, the natural gas companies became responsible for acquiring the natural gas needed to heat the homes and businesses they serve. Natural gas companies place orders for natural gas to be transmitted by pipeline transmission systems to their cities. For placing an order, the natural gas companies need to make a prediction of the city’s natural gas need for that period. In order to encourage natuaral gas companies to make accurate pre- dictions and to help control costs, pipeline transmission systems charge in addition to their usual fees, transmission fines if the order is below need or above need. There is of course some leeway; i.e., there is a minimum amount of errors that go unfined. Suppose a management consulting firm is responsible to make predictions for need of gas for a natural gas company serving a small city. The problem is to predict weekly fuel consumption (y) on the basis of average hourly temperature (x). For this we observed y and x for eight weeks:
Week x y x^2 xy 1 28 12.4 784 347. 2 28 11.7 784 327. 3 32.5 12.4 1056.25 403 4 39.0 10.8 1521 421. 5 45.9 9.4 2106.81 431. 6 57.8 9.5 3340.84 549. 7 58.1 8.0 3375.61 464. 8 62.5 ∑ 7.5 3906.25 468. n i=1 xi^ = 351.^8
∑n i=1 yi^ = 81.^7
∑n i=1 x
2 i = 16874.^74
∑n i=1 xiyi^ = 3413.^11
The plot of y versus x suggests that the simple linear regression model may provide a good fit to the data. Hence, we hypothesize that
yi = β 0 + β 1 xi + i
with
The least squares estimates of the parameters of the model, β 0 and β 1 , are
βˆ 1 = SSxy SSxx β^ ˆ 0 = ¯y − βˆ 1 ¯x
where
SSxy =
∑^ n
i=
xiyi −
∑n i=1 xi)(
∑n i=1 yi) n
SSxx =
∑^ n
i=
x^2 i −
∑n i=1 xi)
2 n
Also, ¯y = 10.2125 and ¯x = 43.98. These yield,
βˆ 1 = SSxy SSxx
βˆ 0 = ¯y − βˆ 1 ¯x = 10.2125 +. 1279 × 43 .98 = 15. 84
The fitted line is given by
yˆi = 15. 84 −. 1279 xi
ˆσ = s =
n − 2 Since y ∼ N (β 0 + β 1 x, σ^2 ) we expect most of the observed responses (roughly 95%) to fall within 2s from the fitted line.
In general, to test
H 0 :β 1 = β versus H 1 :β 1 = β β 1 > β β 1 < β
use the test statistic βˆ 1 − β √^ s SSxx
∼ tn− 2
Reject the null at level α if
|t| > tα/ 2 (n − 2) t > tα(n − 2) t < −tα(n − 2)
with respect to the analogous alternative. Also, a 100(1-α)% confidence interval for β 1 is given by
βˆ 1 ± tα/ 2 (n − 2) √s SSxx Observe that if β 1 = 0 then the population correlation coefficient, ρ, is also equal to zero. Therefore, the t-test for the slope of the model can be also used to test whether ρ = 0.
∑n i=1(yi^ −^ ˆyi)
(^2) = 2.5680112, and
n − 2
So, s =
To test whether β 1 = 0 versus β 1 = 0, we compute the test statistic
t =
βˆ 1 √^ s SSxx
√^.^6542
Since |t| = 7.33, the p-value of the test is smaller twice the area to the right of 5.959. That is, p-value < 2 × .0005 = .001. This is a highly significant result so we reject the null in favor of the alternative. The linear model is useful for modelling the mean of fuel consumption, y.
The coefficient of determination, R^2 , is a measure of the contribution of x or the model in predicting y.
explained variability total variability in y about its mean ¯y SSyy − SSE SSyy
SSyy
In other words, R^2 represents the proportion of variability in y explained by the fitted model. In the case of the simple linear regression, that is when the hypothesized model is of the form y = β 0 + β 1 x + , R^2 = r^2 , where r is the correlation coefficient of y and x.
SSyy
Hence, 89.95% of the variability in the y values about their mean ¯y is ex- plained by the fitted simple linear regression model.
The regression model has two main uses:
The standard error of ˆy as an estimator of the mean y-value when x = xp is
σyˆ = σ
n
(xp − ¯x)^2 SSxx The standard error of ˆy as a predictor of the individual y-value when x = xp is