






Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
The relationship between a response variable and an explanatory variable using linear regression analysis. Topics include the history of regression, simple linear regression model, estimating regression parameters, confidence intervals for slope and intercept, significance tests for intercept and slope, and prediction intervals. The document also covers the analysis of variance for regression.
Typology: Study notes
1 / 10
This page cannot be seen from the preview
Don't miss anything!







Chapter 10 Inference for Regression
This chapter considers the relationship between a response variable and an explanatory variable by using the linear regression analysis. It will focus on confidence intervals for intercept, slope and mean response and significance tests for intercept and slope.
10.1 Simple Linear Regression The term “regression” and the general methods for studying relationships now included under this term were introduced by Francis Galton in 1908, the renowned British biologist. Galton was engaged in the study of heredity. One of his observations was that the children of tall parents to be taller than average but not as tall as their parents. This “regression toward mediocrity” gave these statistical methods their name.
Parents’ height Children’s height 64.5 65. 65.5 66. 66.5 67. 67.5 67. 68.5 68. 69.5 68. 70.5 69. 71.5 69. 72.5 72.
Simple linear regression model:
Given n observations on the explanatory variable x and responses variable y,
(x 1 , y 1 ), (x 2 , y 2 ), · · · , (xn, yn)
Estimating the regression parameters:
Recall the least-squares regression line in Chapter 2:
yˆ = b 0 + b 1 x,
where
b 1 = r
sy sx
=
(xi − x¯)(yi − ¯y) ∑ (xi − x¯)^2
=
Sxy Sxx
b 0 = y¯ − b 1 ¯x, Sxy =
(xi − ¯x)(yi − y¯), Sxx =
(xi − x¯)^2
2 Sxx
1 n +^
¯x^2 Sxx
β 1 ,
σ √ Sxx
β 0 , σ
n
x¯^2 Sxx
n − 2
e^2 i =
n − 2
(yi − yˆi)^2
n − 2 is called the the degrees of freedom for s^2
s √ Sxx
n
x¯^2 Sxx
Parents’ height Children’s height 64.5 65. 65.5 66. 66.5 67. 67.5 67. 68.5 68. 69.5 68. 70.5 69. 71.5 69. 72.5 72.
x ¯ = 68. 5 , y¯ = 68. 4444 Sxy = 41. 1 , Sxx = 60, s = 0. 4998
(a) Find the equation of the least-squares regression line.
(b) Give a 90% confidence interval for the slope and the intercept.
(c) Test H 0 : β 1 = 0 against Ha : β 1 > 0 at the 0.05 significance level.
Solution:
Example 10.2 The relationship between school funding and student performance continues to be a hotly debated political and philosophical issue. Typical of the data available are the following figures, showing the 1991 per-pupil expenditures and average SAT scores for 13 randomly chosen districts in Virginia.
Spending per pupil Average SAT score 3877 886 3947 817 3754 904 3864 754 5770 975 3736 861 4377 887 5107 922 4002 905 4078 890 4259 852 3591 869 4613 909
The following statistics can be derived from the above data
x ¯ = 4228. 85 , y¯ = 879. 308 ,
Sxy = 254046, Sxx = 4602526, s = 42. 74
(a) Find the equation of the least-squares regression line.
(b) Give a 95% confidence interval for the slope and the intercept.
(c) Test H 0 : β 1 = 0 against Ha : β 1 > 0 at the 0.05 significance level.
Solution:
Confidence intervals for mean response:
For any specific value of x, say x∗, the mean of the response is given by
μy = β 0 + β 1 x∗
n
(¯x − x∗)^2 Sxx
yˆ − y syˆ
∼ T (n − 2), where
sˆy = s
n
(¯x − x∗)^2 Sxx
A level C prediction interval for y is ˆy ± t∗sˆy
where t∗^ is the upper (1-C)/2 critical value of Tn− 2.
Example 10.4 Refer to Example 10.1. If Mark’s parents are 70 inches tall, find a 95% pre- diction interval for Mark’s height.
Solution:
Example 10.5 Can the highest price next day of a stock be predicted from today’s closing price? Table below are the closing prices and highest prices (next day) of a stock in NASDAQ.
Closing price Highest price (next day) x y 27.94 27. 26.75 27. 26.19 26. 27.19 27. 26.69 28. 27.87 28. 37.06 39. 36.81 37. 36.38 36. 33.50 34. 31.44 33. 33.25 33. 34.56 36. 34.25 35. 33.19 34. 32.00 31. 31.25 31. 30.00 30. 28.31 31. 28.56 29. It is known that ¯x = 31. 16 , y¯ = 32. 01 , Sxy = 247. 93 , Sxx = 247. 157 , s = 0. 9825
(a) Find the equation of the least-squares regression line.
(b) Give a 95% confidence interval for the slope and the intercept.
(c) Test H 0 : β 1 = 0 against Ha : β 1 > 0 at the 0.05 significance level.
(d) Suppose that today’s closing price is $25. Find a 80% prediction interval for tomorrow’s highest price.
10.2 Analysis of Variance for Regression
Analysis of variance (ANOVA) summarizes information about the sources of variation in the data. It is based on Data = Fit + Residual
Inference for correlation: Let ρ be the population correlation between the variables x and y, and let r be the sample correlation, where
r =
Sxy √ SxxSyy
b 1 sb 1
r
n − 2 √ 1 − r^2
Test for a zero population correlation:
Null hypothesis: H 0 : ρ = 0
t statistic:
t =
r
n − 2 √ 1 − r^2
Alternative Ha P-value Reject H 0 at α level ρ > 0 P (Tn− 2 ≥ t) if t ≥ t∗ α ρ < 0 P (Tn− 2 ≤ t) if t ≤ −t∗ α ρ 6 = 0 2 P (Tn− 2 ≥ |t|) if |t| ≥ t∗ α/ 2