







Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Multiple linear regression analysis to predict mercury concentration (ppm) in fish based on their length (cm) and weight (g). The concept of multiple regression analysis, least-squares estimation, and interpreting the coefficients. It also includes an example of omitted variables and confounding. Useful for students studying statistics, particularly those focusing on regression analysis.
Typology: Study notes
1 / 13
This page cannot be seen from the preview
Don't miss anything!








irritability, and anxiety – the “mad hatter” syndrome.
in fish over their lifetimes.
expense, by catching fish and sending samples to a lab for analysis.
concentration and measurable characteristics of a fish, such as length and weight, in order to develop safety guidelines about how much fish to eat.
based on X 1 = length (cm) and X 2 = weight (g).
1
the sum of squared residuals.
calculating residuals are different (but the idea is the same):
S ( βˆ 0 ,... , βˆ p ) =
∑^ n
i = 1
e^2 i =
∑^ n
i = 1
( yi − ˆ yi )^2 =
∑^ n
i = 1
( yi − ( βˆ 0 + ˆβ 1 xi 1 +... + ˆβ p xi p ))^2.
drop the “LS” superscript.
enlightening. (In the notation of matrix analysis, they are revealing, but that’s outside our scope.) JMP will do the computations for you.
RSS/( n − 2 ).
3
Response Mercury Concentration Whole Model Actual by Predicted Plot
4
fish (length = 0 cm, weight = 0 g)
variables remain fixed, then average mercury concentration increases by β 1 ppm, no matter what the fixed values of the other variables are.
fish’s length by 1cm” (growth hormone?)
we are in case 1: regression will be helpful.
6
certain region, suppose that the relation of yield Y (bushels/acre) to rainfall R is investigated over several years. We also measure temperature T each year but do not include it in the initial analysis.
1964 50 10 4 1965 70 11 5 1966 70 10 5 1967 80 9 5 1968 50 9 4 1969 60 12 4 1970 40 11 4
We could consider the simple regression function E(Y|R):
Bivariate Fit of Y By R
30
40
50
60
70
80
90
Y
7 8 9 10 11 12 13 R
Linear Fit Linear Fit
7
this sample.? (Causally, we would anticipate rain improving yield.)
be associated with a unit change in rainfall: there’s no guarantee that other variables affecting yield (not included in this regression) were being held fixed. (These are omitted variables .)
yield if we intervened to increase water supply. We need a regression coefficient for water supply (i.e., rainfall) which corresponds to all other effects being held fixed.
less measure them all. But perhaps we can find some important ones, and make an argument that the influence of remaining factors can reasonably be glossed.
9
Response Y
Whole Model
Actual by Predicted Plot
30
40
50
60
70
80
90
Y Actual
40 50 60 70 80 Y Predicted P=0.0201 RSq=0. RMSE=7.
Summary of Fit RSquare 0. RSquare Adj 0. Root Mean Square Error 7.
10
temperature – a conclusion which seems causally less peculiar.
data, T was not held constant. It changed from year to year. In fact,
positive effect of more rain, (ii) the negative effect of the lower temperature that usually accompanied more rain.
gave a (noisy) negative slope coefficient, because the coefficient included both the direct effect of rain, and the larger indirect effect of temperature (the omitted variable).
12