
























Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
An overview of multiple and one-way analysis of variance, including multiple linear regression, estimation of parameters, confidence intervals, significance tests, anova table, f-test, and contrasts. It includes examples and solutions for multiple regression and one-way anova hypotheses.
Typology: Study notes
1 / 32
This page cannot be seen from the preview
Don't miss anything!

























Chapter 11 Multiple Regression
It is often the case that a response variable may depend on more than one explanatory variable. It stands to reason that predictions for the response variable may be more accurate if one considers all relevant explanatory variables. In this chapter we use multiple linear regression as a model for these situations and outline some basic facts about inference in the multiple linear regression setting.
Example 11.1 Scholastic Aptitude Test (SAT) are often used as criteria for admission to college, as predictors of college performance, or as indicators for placement in courses. The data below are a sample of SAT scores and freshman-year grade-point averages (GPA) from a northeastern university.
Verbal Math GPA 623 509 2. 454 471 2. 643 700 2. 585 719 3. 719 710 3. ... ... ...
Find a multiple linear regression to estimate GPA from Verbal and Math scores.
Choose b 0 , b 1 , · · · , bp to minimize ∑ (yi − (b 0 + b 1 xi 1 + b 2 xi 2 + · · · + bpxip))^2
s^2 =
n − p − 1
e^2 i =
n − p − 1
(yi − yˆi)^2
n − p − 1 is called the the degrees of freedom for s^2
Confidence intervals and significance tests for βj
A level C confidence interval for βj is
bj ± t∗sbj
where sbj is the standard error of bj and t∗^ is the upper (1- C)/2 critical value of Tn−p− 1.
Significance tests:
Null hypothesis: H 0 : βj = 0
t statistic:
t =
bj sbj
ANOVA table for multiple regression
Source DF SS MS F P-vlaue Regression p SSM MSM MSM/MSE Error n − p − 1 SSE MSE Total n − 1 SST
where
SSM =
(ˆyi − y¯)^2 , SSE =
(yi − yˆi)^2 , SST =
(yi − y¯)^2 , SST = SSM + SSE MSM =
p
n − p − 1 Analysis of Variance F Test
H 0 : β 1 = β 2 = · · · = βp = 0, Ha : at least one of the βj is not 0
F statistic: f = MSMMSE
P-value for the test: P (Fp,n−p− 1 ≥ f ), where Fp,n−p− 1 has the F distribution with p and n − p − 1 degrees of freedom.
Squared multiple correlation The statistic
R^2 =
(ˆyi − y¯)^2 ∑ (yi − y¯)^2
is the proportion of the variation of the response variable y that is explained by the explanatory variables x 1 , x 2 , · · · , xp in a multiple linear regression.
x 1 x 2 x 3 y 3 2 1656 122 4 1.5 1680 153 4 2 2170 167 4 2.5 1881 170 4 2.5 2700 180 3 2.5 2010 190 3 2.5 2718 369 4 3.2 3819 400 6 4.5 3568 400 3 2.5 2725 450 3 3 3600 510 4 3.2 3156 375 3 2 2619 182 3 2 1465 175 2 1.5 1200 170 3 2 1416 146
Below is part of the Minitab output:
j bj sbj t P-value 0 36.54 69. 1 -67.11 27. 2 81.48 48. 3 0.10739 0.
ANOVA Table
Source DF SS MS F P Regression 205863 Residual Error 49990 Total
(a) Complete the above tables. (b) Find the multiple regression equation. (c) Find a 90% confidence interval for β 1
Chapter 12 One-Way Analysis of Variance
The statistical methodology for comparing several means is called analysis of variance, or simply ANOVA.
Example 12.1 To compare three varieties of wheat, 4 plants of each variety are grown and the yield in pounds is recorded.
Wheat v 1 474 486 467 455 v 2 492 498 500 480 v 3 460 485 490 465
Are the average yields same for the three varieties of wheat?
Solution:
Let xij represent the jth observation from the ith popula- tion.
x 11 , x 12 , · · · , x 1 n 1 ∼ N (μ 1 , σ) x 21 , x 22 , · · · , x 2 n 2 ∼ N (μ 2 , σ) · · · · · · xI 1 , xI 2 , · · · , xInI ∼ N (μI , σ)
Parameters: μ 1 , μ 2 ,... , μI , σ
Estimator of μi: x¯i =
ni
∑ni
j=
xij
Pooled estimator of σ: Suppose we have sample variances s^21 , s^22 ,.. ., s^2 I from I independent random samples of sizes n 1 , n 2 ,... , nI from populations with common variance σ^2.
s^2 p =
(n 1 − 1)s^21 + (n 2 − 1)s^22 + · · · + (nI − 1)s^2 I (n 1 − 1) + (n 2 − 1) + · · · + (nI − 1)
sp =
s^2 p
i=
∑^ ni
j=
(xij − x¯)^2
i=
ni(¯xi − x¯)^2
i=
∑^ ni
j=
(xij − x¯i)^2
Mean Squares:
MS = Sum of squares Degrees of freedom
ANOVA Table
Source DF SS MS F P-value Group I-1 SSG MSG MSG/MSE Error N − I SSE MSE Total N − 1 SST
Remark: s^2 p = M SE ANOVA F Test
H 0 : μ 1 = μ 2 = · · · = μI , Ha : not all μi are equal
F statistic: f = MSMMSE
P-value for the test: P (FI− 1 ,N −I ≥ f ), where FI− 1 ,N −I has the F distribution with I-1 and N-I degrees of freedom.
Rule for examining standard deviations: If the ratio of the largest sample standard deviation to