Least-Squares Regression Analysis: Urbanization and Coliform Bacteria Concentration | Study notes Urbanization

eesc BC 3017 statistics notes 1

LINEAR REGRESSION

Systematic var iation in the true value

Up to now, wehav e been thinking about measurement as sampling of values from an

ensemble of all possible outcomes in order to estimate the true value (which would,

according to our previous discussion, be well approximated by the mean of a very large

sample). Givenasample of outcomes, we have sometimes checked the hypothesis that it

is a random sample from some ensemble of outcomes, by plotting the data points against

some other variable, such as ordinal position. Under the hypothesis of random selection,

no clear trend should appear.Howev er, the contrary case, where one finds a clear trend,

is very important. Aclear trend can be a discovery,rather than a nuisance! Whether it is

adiscovery or a nuisance (or both) depends on what one finds out about the reasons

underlying the trend. In either case one must be prepared to deal with trends in analyzing

data.

Figure 2.1 (a) shows a plot of (hypothetical) data in which there is a very clear trend. The

yaxis scales concentration of coliform bacteria sampled from rivers in various regions

(units are colonies per liter). The x axis is a hypothetical indexofregional urbanization,

ranging from 1 to 10. The hypothetical data consist of 6 different measurements at each

levelofurbanization. The mean of each set of 6 measurements givesarough estimate of

the true value for coliform bacteria concentration for rivers in a region with that

urbanization level. The jagged dark line drawn on the graph connects these estimates of

true value and makes the trend quite clear: more extensive urbanization is associated with

higher true values of bacteria concentration. The straight dashed line shows that much of

this trend can be approximated by a linear relationship. The equation of the dashed line

is y =5. 3 +8. 0 x, i.e., the true value of bacteria concentration is estimated to increase

by about 8 colonies per liter for each unit increase in the urbanization index.

The linear relationship does not fit the data too well at the lowest and especially at the

highest end of urbanization. Alinear trend represents a useful approximation, but there

may be better ways to represent the actual trend. One device that is often very useful is to

replace concentration by its logarithm. Figure 2.1 (b) shows the analogous plot, using

base-10 logarithms of bacterial concentration for the y-axis scale. Nowthe linear

relationship fits very well, and indeed, one can viewthe bumps in the jagged dark line as

plausibly due to noise. The equation of the dashed straight line in 2.1(b) is

y=1. 25 +0. 071 x. Each increase of one unit urbanization adds 0.071 to the log

concentration, i.e., multiplies the concentration itself by a factor of 10.071 ≈1. 18. This

trend can be summarized by saying that the true value of bacteria concentration is

estimated to increase by about 18% for each unit increase in urbanization index.

Least-Squares Regression Analysis: Urbanization and Coliform Bacteria Concentration, Study notes of Urbanization

Related documents

Partial preview of the text

Download Least-Squares Regression Analysis: Urbanization and Coliform Bacteria Concentration and more Study notes Urbanization in PDF only on Docsity!

LINEAR REGRESSION

urbanization index

coliform bacteria (log10 colonies per liter)

Figure 2.1 (b)

bacterial concentration versus urbanization

logarithmic plot, showing empirical and linear trend

X

Y

m

Figure 2.

linear regression demonstration

showing data points, regression line, and residuals

(mean(X), mean(Y)) plotted as ’m’

Urbanization

residuals: log (base 10) colonies per liter

Figure 2.3 (b)

residuals from linear fit in Figure 2.1 (b)