Multiple Linear Regression - Lecture Notes | GEOS 585A | Study notes Geology

Notes_11, GEOS 585A, Spring 2009 1

11 Multiple Linear Regression

Multiple linear regression (MLR) is a method used to model the linear relationship between a

dependent variable and one or more independent variables. The dependent variable is sometimes

also called the predictand, and the independent variables the predictors. MLR is based on least

squares: the model is fit such that the sum-of-squares of differences of observed and predicted

values is minimized. MLR is probably the most widely used method in dendroclimatology for

developing models to reconstruct climate variables from tree-ring series. Typically, a climatic

variable is defined as the predictand and tree-ring variables from one or more sites are defined as

predictors. The model is fit to a period – the calibration period – for which climatic and tree-ring

data overlap. In the process of fitting, or estimating, the model, statistics are computed that

summarize the accuracy of the regression model for the calibration period. The performance of

the model on data not used to fit the model is usually checked in some way by a process called

validation. Finally, tree-ring data from before the calibration period are substituted into the

prediction equation to get a reconstruction of the predictand. The reconstruction is a “prediction”

in the sense that the regression model is applied to generate estimates of the predictand variable

outside the period used to fit the data. The uncertainty in the reconstruction is summarized by

confidence intervals, which can be computed by various alternative ways.

Regression has long been used in dendroclimatology for reconstructing climate variables from

tree rings. A few examples of dendroclimatic studies using linear regression are reconstruction of

annual precipitation in the Pacific Northwest (Graumlich 1987), reconstruction of runoff of the

White River, Arkansas (Cleaveland and Stahle 1989), reconstruction of an index of the El Nino

Southern Oscillation (Michaelsen 1989), and reconstruction of a drought index for Iowa

(Cleaveland and Duvick 1992).

MLR is not strictly a “time series” method. The most important point in application to time

series is that observations are typically not independent of one another. As a consequence,

special attention must be paid to a regression assumption about the independence of the residuals.

The predictors in any regression problem might be intercorrelated. This so-called

multicolinearity does not preclude the use of regression, but can make it impossible or difficult to

assess the relative importance of individual predictors from the estimated coefficients of the

regression equation.

It is ironic that the most interesting periods of dendroclimatic reconstructions derived from

regression are often the periods for which application of the regression model is most

problematical -- periods whose climatic anomalies are most unlike those of today. The

reconstruction for those periods is likely to be more uncertain than implied by regression statistics

because the predictors are in a part of the “multivariate predictor space” not sampled by the data

used to fit the model. The statistical aspects of this problem can be addressed by distinguishing

predictions as extrapolations, as opposed to interpolations.

The MLR model is reviewed below, with emphasis on topics of particular interest for time

series. More detailed information can be found in many standard references – for example, ae

statistical text on regression (Weisberg 1985), a chapter on regression as applied to the

atmospheric sciences (Wilks 1995) and a monograph on regression in a time series context

(Ostrom 1990).

Multiple Linear Regression - Lecture Notes | GEOS 585A, Study notes of Geology

Related documents

Partial preview of the text

Download Multiple Linear Regression - Lecture Notes | GEOS 585A and more Study notes Geology in PDF only on Docsity!

11 Multiple Linear Regression

11.1 Model

11.2 Assumptions

R = = − (6)

written as σ e^2 while the sample estimate is given by

2 MSE

F = (9)

R = − (10)

MSE ( 1)SSE

MST 1 SST

n

n K

n

n K

the predictor, and the summation is over the n years in the calibration period. The 100(1 − α)%

confidence interval is desired, the appropriate α -level is α = 0.05. A t -table for this sample size

and α -level gives t .025,43 = 2.02. The corresponding confidence interval is

11.5 Multicolinearity

VIF

11.6 Analysis of Residuals

Q N r