Linear Regression Analysis: Inference for Intercept, Slope, and Mean Response, Study notes of Data Analysis & Statistical Methods

The relationship between a response variable and an explanatory variable using linear regression analysis. Topics include the history of regression, simple linear regression model, estimating regression parameters, confidence intervals for slope and intercept, significance tests for intercept and slope, and prediction intervals. The document also covers the analysis of variance for regression.

Typology: Study notes

Pre 2010

Uploaded on 07/29/2009

koofers-user-jow
koofers-user-jow 🇺🇸

10 documents

1 / 10

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Chapter 10
Inference for Regression
This chapter considers the relationship between a response variable and an explanatory
variable by using the linear regression analysis. It will focus on confidence intervals for intercept,
slope and mean response and significance tests for intercept and slope.
10.1 Simple Linear Regression
The term regression and the general methods for studying relationships now included
under this term were introduced by Francis Galton in 1908, the renowned British biologist.
Galton was engaged in the study of heredity. One of his observations was that the children of
tall parents to be taller than average but not as tall as their parents. This regression toward
mediocrity gave these statistical methods their name.
Parents’ height Children’s height
64.5 65.8
65.5 66.7
66.5 67.2
67.5 67.6
68.5 68.2
69.5 68.9
70.5 69.5
71.5 69.9
72.5 72.2
Simple linear regression model:
Given nobservations on the explanatory variable xand responses variable y,
(x1, y1),(x2, y2),···,(xn, yn)
statistical model:
yi=β0+β1xi+εi,
where εiare assumed to be independent N(0, σ)
Parameters:β0,β1,σ
Mean response:E(yi) = β0+β1xi
Population regression line:µy=β0+β1x
10-1
pf3
pf4
pf5
pf8
pf9
pfa

Partial preview of the text

Download Linear Regression Analysis: Inference for Intercept, Slope, and Mean Response and more Study notes Data Analysis & Statistical Methods in PDF only on Docsity!

Chapter 10 Inference for Regression

This chapter considers the relationship between a response variable and an explanatory variable by using the linear regression analysis. It will focus on confidence intervals for intercept, slope and mean response and significance tests for intercept and slope.

10.1 Simple Linear Regression The term “regression” and the general methods for studying relationships now included under this term were introduced by Francis Galton in 1908, the renowned British biologist. Galton was engaged in the study of heredity. One of his observations was that the children of tall parents to be taller than average but not as tall as their parents. This “regression toward mediocrity” gave these statistical methods their name.

Parents’ height Children’s height 64.5 65. 65.5 66. 66.5 67. 67.5 67. 68.5 68. 69.5 68. 70.5 69. 71.5 69. 72.5 72.

Simple linear regression model:

Given n observations on the explanatory variable x and responses variable y,

(x 1 , y 1 ), (x 2 , y 2 ), · · · , (xn, yn)

  • statistical model: yi = β 0 + β 1 xi + εi, where εi are assumed to be independent N (0, σ)
  • Parameters: β 0 , β 1 , σ
  • Mean response: E(yi) = β 0 + β 1 xi
  • Population regression line: μy = β 0 + β 1 x

Estimating the regression parameters:

Recall the least-squares regression line in Chapter 2:

yˆ = b 0 + b 1 x,

where

b 1 = r

sy sx

=

(xi − x¯)(yi − ¯y) ∑ (xi − x¯)^2

=

Sxy Sxx

b 0 = y¯ − b 1 ¯x, Sxy =

(xi − ¯x)(yi − y¯), Sxx =

(xi − x¯)^2

  • E(b 0 ) = β 0 , E(b 1 ) = β 1
  • Var(b 1 ) = σ

2 Sxx

  • Var(b 0 ) = σ^2

1 n +^

¯x^2 Sxx

  • b 1 ∼ N

β 1 ,

σ √ Sxx

  • b 0 ∼ N

β 0 , σ

n

x¯^2 Sxx

  • Predicted response: ˆyi = b 0 + b 1 xi
  • Residual: ei = yi − ˆyi = yi − b 0 − b 1 xi
  • The estimate of σ^2 : s^2 =

n − 2

e^2 i =

n − 2

(yi − yˆi)^2

n − 2 is called the the degrees of freedom for s^2

  • E(s^2 ) = σ^2
  • Standard error of b 1 : sb 1 =

s √ Sxx

  • Standard error of b 0 : sb 0 = s

n

x¯^2 Sxx

Parents’ height Children’s height 64.5 65. 65.5 66. 66.5 67. 67.5 67. 68.5 68. 69.5 68. 70.5 69. 71.5 69. 72.5 72.

x ¯ = 68. 5 , y¯ = 68. 4444 Sxy = 41. 1 , Sxx = 60, s = 0. 4998

(a) Find the equation of the least-squares regression line.

(b) Give a 90% confidence interval for the slope and the intercept.

(c) Test H 0 : β 1 = 0 against Ha : β 1 > 0 at the 0.05 significance level.

Solution:

Example 10.2 The relationship between school funding and student performance continues to be a hotly debated political and philosophical issue. Typical of the data available are the following figures, showing the 1991 per-pupil expenditures and average SAT scores for 13 randomly chosen districts in Virginia.

Spending per pupil Average SAT score 3877 886 3947 817 3754 904 3864 754 5770 975 3736 861 4377 887 5107 922 4002 905 4078 890 4259 852 3591 869 4613 909

The following statistics can be derived from the above data

x ¯ = 4228. 85 , y¯ = 879. 308 ,

Sxy = 254046, Sxx = 4602526, s = 42. 74

(a) Find the equation of the least-squares regression line.

(b) Give a 95% confidence interval for the slope and the intercept.

(c) Test H 0 : β 1 = 0 against Ha : β 1 > 0 at the 0.05 significance level.

Solution:

Confidence intervals for mean response:

For any specific value of x, say x∗, the mean of the response is given by

μy = β 0 + β 1 x∗

  • σˆy = σ

n

(¯x − x∗)^2 Sxx

  • yˆ − y ∼ N (0, σˆy)

yˆ − y syˆ

∼ T (n − 2), where

sˆy = s

n

(¯x − x∗)^2 Sxx

A level C prediction interval for y is ˆy ± t∗sˆy

where t∗^ is the upper (1-C)/2 critical value of Tn− 2.

Example 10.4 Refer to Example 10.1. If Mark’s parents are 70 inches tall, find a 95% pre- diction interval for Mark’s height.

Solution:

Example 10.5 Can the highest price next day of a stock be predicted from today’s closing price? Table below are the closing prices and highest prices (next day) of a stock in NASDAQ.

Closing price Highest price (next day) x y 27.94 27. 26.75 27. 26.19 26. 27.19 27. 26.69 28. 27.87 28. 37.06 39. 36.81 37. 36.38 36. 33.50 34. 31.44 33. 33.25 33. 34.56 36. 34.25 35. 33.19 34. 32.00 31. 31.25 31. 30.00 30. 28.31 31. 28.56 29. It is known that ¯x = 31. 16 , y¯ = 32. 01 , Sxy = 247. 93 , Sxx = 247. 157 , s = 0. 9825

(a) Find the equation of the least-squares regression line.

(b) Give a 95% confidence interval for the slope and the intercept.

(c) Test H 0 : β 1 = 0 against Ha : β 1 > 0 at the 0.05 significance level.

(d) Suppose that today’s closing price is $25. Find a 80% prediction interval for tomorrow’s highest price.

10.2 Analysis of Variance for Regression

Analysis of variance (ANOVA) summarizes information about the sources of variation in the data. It is based on Data = Fit + Residual

Inference for correlation: Let ρ be the population correlation between the variables x and y, and let r be the sample correlation, where

r =

Sxy √ SxxSyy

b 1 sb 1

r

n − 2 √ 1 − r^2

Test for a zero population correlation:

Null hypothesis: H 0 : ρ = 0

t statistic:

t =

r

n − 2 √ 1 − r^2

Alternative Ha P-value Reject H 0 at α level ρ > 0 P (Tn− 2 ≥ t) if t ≥ t∗ α ρ < 0 P (Tn− 2 ≤ t) if t ≤ −t∗ α ρ 6 = 0 2 P (Tn− 2 ≥ |t|) if |t| ≥ t∗ α/ 2