






Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Material Type: Notes; Class: Applied Time Series Analysis; Subject: GEOSCIENCES; University: University of Arizona; Term: Spring 2009;
Typology: Study notes
1 / 11
This page cannot be seen from the preview
Don't miss anything!







Multiple linear regression (MLR) is a method used to model the linear relationship between a dependent variable and one or more independent variables. The dependent variable is sometimes also called the predictand, and the independent variables the predictors. MLR is based on least squares: the model is fit such that the sum-of-squares of differences of observed and predicted values is minimized. MLR is probably the most widely used method in dendroclimatology for developing models to reconstruct climate variables from tree-ring series. Typically, a climatic variable is defined as the predictand and tree-ring variables from one or more sites are defined as predictors. The model is fit to a period – the calibration period – for which climatic and tree-ring data overlap. In the process of fitting, or estimating, the model, statistics are computed that summarize the accuracy of the regression model for the calibration period. The performance of the model on data not used to fit the model is usually checked in some way by a process called validation. Finally, tree-ring data from before the calibration period are substituted into the prediction equation to get a reconstruction of the predictand. The reconstruction is a “prediction” in the sense that the regression model is applied to generate estimates of the predictand variable outside the period used to fit the data. The uncertainty in the reconstruction is summarized by confidence intervals, which can be computed by various alternative ways. Regression has long been used in dendroclimatology for reconstructing climate variables from tree rings. A few examples of dendroclimatic studies using linear regression are reconstruction of annual precipitation in the Pacific Northwest (Graumlich 1987), reconstruction of runoff of the White River, Arkansas (Cleaveland and Stahle 1989), reconstruction of an index of the El Nino Southern Oscillation (Michaelsen 1989), and reconstruction of a drought index for Iowa (Cleaveland and Duvick 1992). MLR is not strictly a “time series” method. The most important point in application to time series is that observations are typically not independent of one another. As a consequence, special attention must be paid to a regression assumption about the independence of the residuals. The predictors in any regression problem might be intercorrelated. This so-called multicolinearity does not preclude the use of regression, but can make it impossible or difficult to assess the relative importance of individual predictors from the estimated coefficients of the regression equation. It is ironic that the most interesting periods of dendroclimatic reconstructions derived from regression are often the periods for which application of the regression model is most problematical -- periods whose climatic anomalies are most unlike those of today. The reconstruction for those periods is likely to be more uncertain than implied by regression statistics because the predictors are in a part of the “multivariate predictor space” not sampled by the data used to fit the model. The statistical aspects of this problem can be addressed by distinguishing predictions as extrapolations , as opposed to interpolations. The MLR model is reviewed below, with emphasis on topics of particular interest for time series. More detailed information can be found in many standard references – for example, ae statistical text on regression (Weisberg 1985), a chapter on regression as applied to the atmospheric sciences (Wilks 1995) and a monograph on regression in a time series context (Ostrom 1990).
Model equation. The model expresses the value of a predictand variable as a linear function of one or more predictor variables and an error term:
0 1 ,1 2 ,2 , th , 0
value of predictor in year regression constant coefficient on the predictor total number of predictors = predictand in year error term
i i i K i K i i k
th k
i i
y b b x b x b x e
x k i b b k K y i e
Prediction equation. The model (1) is estimated by least squares, which yields parameter estimates such that the sum of squares of errors is minimized. The resulting prediction equation is
y ˆ i = b ˆ 0 (^) + b x ˆ 1 i (^) ,1 + b x ˆ 2 i (^) ,2 + … + b x ˆ K i K , (2)
where the variables are defined as in (1) except that “^” denotes estimated values
Residuals. The error term in (1) is unknown because the true model is unknown. Once the model has been estimated, the regression residuals are defined as ˆ ˆ observed value of predictand in year ˆ predicted value of predictand in year
i i i i i
e y y y i y i
The residuals measure the closeness of fit of the predicted values and actual predictand in the calibration period. The algorithm for estimating the regression equation (solution of the normal equations) guarantees that the residuals have a mean of zero for the calibration period. The variance of the residuals measures the “size” of the error, and is small if the model fits the data well.
The MLR model is based on several assumptions. Provided the assumptions are satisfied, the regression estimators are optimal in the sense that they are unbiased, efficient, and consistent. Unbiased means that the expected value of the estimator is equal to the true value of the parameter. Efficient means that the estimator has a smaller variance than any other estimator. Consistent means that the bias and variance of the estimator approach zero as the sample size approaches infinity. Ostrom (1990, p. 14) lists six basic assumptions for the regression model:
1. Linearity : the relationship between the predictand and the predictors is linear. The MLR model applies to linear relationships. If relationships are nonlinear, there are two recourses: (1) transform the data to make the relationships linear, or (2) use an alternative statistical
Coefficient of determination. The explanatory power of the regression is summarized by its “R-squared” value, computed from the sums-of-squares terms as
2 SSR^1 SSE SST SST
R^2 , also called the coefficient of determination, is often described as the proportion of variance “accounted for” , “explained”, or “described” by regression. It is important to keep in mind that a
high R^2 does not imply causation. The relative sizes of the sums-of-squares terms indicate how “good” the regression is in terms of fitting the calibration data. If the regression is “perfect”, all
residuals are zero, SSE is zero, and R^2 is 1. If the regression is a total failure, the sum-of-squares of residuals equals the total sum-of-squares, no variance is accounted for by regression, and R^2 is zero.
ANOVA table and definition of “mean squared” terms. The sums-of-squares terms and related statistics are often summarized in an Analysis of Variance (ANOVA) table:
Source df SS MS Total n − 1 SST MST = SST/( n -1) Regression K SSR MSR^ =SSR/^ K Residual n − K − 1 SSE MSE = SSE/( n − K −1)
Source=source of variation SS=sum-of-squares term df =degrees of freedom for SS term MS=”mean squared” terms
The mean squared terms are the sums-of-squares terms divided by the degrees of freedom.
Standard error of the estimate. The residual mean square (MSE) is the sample estimate of the variance of the regression residuals. The population value of the error term is sometimes
se = (7)
where MSE has been defined previously. The square root of the residual mean square is called the root-mean-square error (RMSE), or the standard error of the estimate 2 se = se = MSE = RMSEc (8)
The subscript “c” is attached ( RMSEc ) in (8) to distinguish the RMSE derived from calibration
from the root-mean-square error derived by cross-validation (see later).
F ratio, or “overall F”. Recall that the explanatory power of a regression is given by the regression R^2 , which is computed from sums-of-squares terms. The F -ratio, or overall F , which is computed from the mean squared terms in the ANOVA table, estimates the statistical significance of the regression equation. The F -ratio is given by MSR MSE
The advantage of the F- ratio over R^2 is that the F- ratio takes into account the degrees of freedom, which depend on the sample size and the number of predictors in the model. A model
can have a high R^2 and still not be statistically significant if the sample size is not large compared with the number of predictors in the model. The F- ratio incorporates sample size and number of predictors in an assessment of significance of the relationship. The significance of the F- ratio is obtained by referring to a table of the F distribution, using degrees of freedom {df1,df2}, where df1 and df2 are the degrees of freedom for the regression mean square and residual mean square from the ANOVA table.
Adjusted R^2. The R^2 value for a regression can be made arbitrarily high simply by including more and more predictors in the model. The adjusted R^2 is one of several statistics that attempts to compensate for this artificial increase in accuracy. The adjusted R^2 is given by
2 1 MSE MST
where MSE and MST are the mean squared terms previously defined in the ANOVA table. Referring to the ANOVA table shows that ratio of mean squared terms is related to the ratio of sum-of-squares terms by
where n is the number of observations, and K is the number of predictors. Because
must be greater than zero, it can immediately be seen that adjusted R^2 must be
smaller than R^2 , and that the difference in the two statistics depends on both the sample size and the number of predictors in the model.
Confidence interval for estimated coefficients. If the regression assumptions on the residuals are satisfied, including the normality assumption, then the sampling distribution of an estimated regression coefficient is normal with a variance proportional to the residual mean square (MSE). The variance of the estimator also depends on the variances and covariances of the predictors. The idea is best illustrated for the case of simple linear regression, for which the variance of the regression coefficient is given by
2 (^1 )
1
var( ˆ) e n i i
s b x x =
where s e^2 is the residual mean square, xi is the value of the predictor in year i , x is the mean of
confidence interval is b ˆ 1^ ± t (^) α/ 2 var( b ˆ 1 ), where t α / 2is obtained from a t distribution with n − 2
degrees of freedom. For example, if the sample size is n = 45 years, the degrees of freedom is 43. If the 95%
b ˆ 1 (^) ± 2.02 var( b ˆ 1 ) (13)
is fit to the full length of tree-ring series and the model residuals are regarded as residual indices. These residual indices are then used as predictors in the climate reconstruction model. The rationale is that the current year’s index is preconditioned to some extent by past conditions, including biology and possibly climate. The time-series modeling presumably adjusts for this distortion by removing the linear dependence of the tree-ring index on its past values.
The predictors in a regression model are often called the “independent variables”, but this term does not imply that the predictors are themselves independent statistically from one another. In fact, for natural systems, the predictors can be highly intercorrelated. “Multicolinearity” is a term reserved to describe the case when the intercorrelation of predictor variables is high. It has been noted that the variance of the estimated regression coefficients depends on the intercorrelation of predictors (equation (14)). Haan (2002) concisely summarizes the effects of multicolinearity on the regression model. Multicolinearity does not invalidate the regression model in the sense that the predictive value of the equation may still be good as long as the prediction are based on combinations of predictors within the same multivariate space used to calibrate the equation. But there are several negative effects of multicolinearity. First, the variance of the regression coefficients can be inflated so much that the individual coefficients are not statistically significant
Signs of multicolinearity. Signs of multicolinearity include 1) high correlation between pairs of predictor variables, 2) regression coefficients whose signs or magnitudes do not make good physical sense, 3) statistically nonsignificant regression coefficients on important predictors, and 4) extreme sensitivity of sign or magnitude of regression coefficients to insertion or deletion of a predictor variable.
Variance Inflation Factor (VIF). The Variance Inflation Factor (VIF) is a statistic that can be used to identify multicolinearity in a matrix of predictor variables. “Variance Inflation” refers here to the mentioned effect of multicolinearity on the variance of estimated regression coefficients. Multicolinearity depends not just on the bivariate correlations between pairs of predictors, but on the multivariate predictability of any one predictor from the other predictors. Accordingly, the VIF is based on the multiple coefficient of determination in regression of each predictor in multivariate linear regression on all the other predictors:
2
i 1 Ri
where R i^2 is the multiple coefficient of determination in a regression of the ith predictor on all
other predictors, and VIFi is the variance inflation factor associated with the ith predictor. Note
that if the ith predictor is independent of the other predictors, the variance inflation factor is one, while if the ith predictor can be almost perfectly predicted from the other predictors, the variance inflation factor approaches infinity. In that case the variance of the estimated regression coefficients is unbounded.
Multicolinearity is said to be a problem when the variance inflation factors of one or more predictors becomes large. How large appears to be a subjective judgement. According to Haan (2002), some researchers use a VIF of 5 and others use a VIF of 10 as a critical threshold. These
VIF values correspond, respectively, to R i^2 values of 0.80 and 0.90. Some compute the average
VIF for all predictors and declare that an average “considerably” larger than one indicates multicolinearity (Haan, 2002). At any rate, it is important to keep in mind that multicolinearity requires strong intercorrelation of predictors, not just non-zero intercorrelation. The VIF is closely related to a statistic call the tolerance, which is 1/VIF. Some statistics packages report the VIF and some report the tolerance (Haan 2002).
Analysis of residuals consists of examining graphs and statistics of the regression residuals to check that model assumptions are satisfied. Some frequently used residuals tests are listed below.
Time series plot of residuals. The time series plot of residuals can indicate such problems as non-constant variance of residuals, and trend or autocorrelation in residuals. A time-dependent variance might show, say, as an increasing scatter of the residuals about the zero line with time. The slope of the scatter plot of residuals on time can be tested for significance to identify trend in residuals.
Scatterplot of residuals against predicted values.. The residuals are assumed to be uncorrelated with the predicted values of the predictand. Violation is indicated by some noticeable pattern of dependence in the scatterplots. For example, the residual might flare out (increased scatter) with increasing value of the predictand; the remedy might be a transformation (e.g., log transform) of the predictand.
Scatterplots of residuals against individual predictors. The residuals are assumed to be uncorrelated with the individual predictors. Violation of these assumptions would be indicated by some noticeable pattern of dependence in the scatterplots, and might suggest transformation of the predictors.
Histogram of residuals. The residuals are assumed to be normally distributed. Accordingly, the histogram of residuals should resemble a normal pdf. But keep in mind that a random sample from a normal distribution will be only approximately normal, and so the some departures from normality in the appearance of the histogram are expected – especially for small sample size.
Acf of residuals. The residuals are assumed to be nonautocorrelated. If the assumption is satisfied, the acf of residuals should not be large at any non-zero lag. Special interest should be attached to the lowest lags, since physical systems are characterized by persistence from year to year.
Lag-1 scatterplot of residuals. This plot also deals with the assumption of independence of residuals. The residuals at time t should be independent of the residuals at time t-1. The scatterplot should therefore resemble a formless cluster of points. Alignment in some direction might be evidence of autocorrelation of residuals at lag 1.
For negative autocorrelation, the decision rule is 0 0
if 4 reject if 4 do not reject if 4 4 inconclusive
l u u l
d d H d d H d d d
According to Ostrom (1990, p. 29), some researchers reject the D-W statistic in favor of a simple of rule of thumb for how much autocorrelation of residuals is tolerable in regression. This rule of thumb is that an alternative method to regression should be used if the first-order autocorrelation of residuals is greater than 0.30.
Portmanteau test. The portmanteau statistic, or Q statistic, is designed to test whether the regression residuals are purely random, or white noise (Ostrom 1990, p. 50). Unlike the D-W test, the portmanteau test does not restrict the possible form of autocorrelation to first-order autoregressive. The null hypothesis for the test is that the residuals are completely random; the alternative hypothesis is that the residuals are generated by an autoregressive or moving average model of some order. If the residuals are random, the acf of residuals should be zero at all nonzero lags. The Q statistic is computed as
2 1
K k i
where rk is the lag-k autocorrelation coefficient of the regression residuals, N is the length of the
time series of residuals, and K is chosen as the maximum anticipated order of autoregressive or moving-average process hypothesized under the alternative hypothesis to have generated the residuals. As a rule of thumb, K should be chosen as no larger than about N / 4or N / 5, where N is the length of the time series. If the null hypothesis is true, Q is distributed as chi-square with K degrees of freedom. Large acf coefficients lead to a high computed Q. A high Q therefore indicates significant autocorrelation and rejection of the null hypothesis. The p- value for Q is the probability of obtaining as high a Q as computed when the null hypothesis is true. The p- value for a computed Q can obtained from a chi-square table. In summary, rejection of the null hypothesis is indicated by large acf coefficients and high computed Q. The more significant the computed Q, the lower its p -value.
References
Cleaveland, M., and Duvick, D.N., 1992, Iowa climate reconstructed from tree rings, 1640-1982, Water Resources Research 28(10), 2607-2615. Cleaveland, M.K., and Stahle, D.W., 1989, Tree ring analysis of surplus and deficit runoff in the White River, Arkansas, Water Resources Research 25 (6), 1391-1401. Graumlich, L.J., 1987, Precipitation variation in the Pacific Northwest (1675-1975) as reconstructed from tree rings, Annals of the Association of American Geographers 77(1), 19-29. Haan C. T. (2002) Statistical methods in Hydrology, second edition. Iowa State University Press, Ames, Iowa. Michaelsen, J., 1989, Long-period fluctuations in El Nino amplitude and frequency reconstructed from tree-rings, Geophysical Monograph 55, American Geophysical Union, 69-74. Ostrom, C.W., Jr., 1990, Time Series Analysis, Regression Techniques, Second Edition: Quantitative Applications in the Social Sciences, v. 07-009: Newbury Park, Sage Publications. Rencher, A.C., and Pun, Fu Ceayong, 1980, Inflation of R2 in best subset regression, Technometrics 22 (1), 49-53.
Weisberg, S., 1985, Applied Linear Regression, 2nd ed., John Wiley, New York, 324 pp. Wilks, D.S., 1995, Statistical methods in the atmospheric sciences: Academic Press, 467 p. Woodhouse, C.A., 1999, Artificial neural networks and dendroclimatic reconstructions: An example from the Front Range, Colorado, USA: The Holocene, v. 9, no. 5, p. 521-529.