GEOS 585A, Assignment 11: Multiple Linear Regression, Assignments of Geology

An assignment for a geosciences course (geos 585a) focusing on multiple linear regression. Students are required to use the geosa11.m program to select time series, preprocess data, and perform stepwise entry of predictors. The assignment includes analyzing model statistics, residuals, and assessing assumptions such as distribution of residuals and autocorrelation. Students must identify significant predictors and interpret results.

Typology: Assignments

Pre 2010

Uploaded on 08/31/2009

koofers-user-9tn-1
koofers-user-9tn-1 🇺🇸

5

(1)

10 documents

1 / 4

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Assignment 11, GEOS 585A, Spring 2009
1
ASSIGNMENT 11. MULTIPLE LINEAR REGRESSION
1. Run geosa11.m, selecting part I (assignment 11) from the first menu. This selection
specifies multiple linear regression without cross-validation.
Menus appear allowing you 1) select time series to be used as predictand and predictors ,
2) log-transform the predictand, 3) prewhiten the predictors, 4) include lagged predictors
(lags up to t-2 or t+2 relative to the year of the predictand, and 5) specify a calibration
period. Make choices that seem reasonable from your knowledge of the data and results
of previous analyses. For this exercise, make sure your pool of potential predictors
includes at least 6 variables. This can be achieved by various combinations of number of
predictor series and lags. For example, two predictor series and lags -1 through +1 gives
six potential predictors. Or six predictor time series and no lags also six potential
predictors.
You will then begin an interactive modeling process that consists of stepwise entry of
predictors one-by-one, with examination of model statistics and residuals analysis at each
step. Note that at each step you must press “being modeling or add another predictor” ,
followed by “review results.” For this exercise, run the model out to 4 steps. Then press
“Quit.” You should have 4 predictors are in your final model.
You will have four Figure windows. Answer the following questions and turn in your
answers with printouts of the figure windows.
2. (Caption to Fig 1) What is the explanatory power of the relationship? Refer to the R2,
overall F of the regression, the p-value for F in explaining your answer. Does the F-
value support a statistically significant relationship? Why is adjusted R2 smaller than R2?
3. (Caption to Fig. 2) For which predictors are the regression coefficients significantly
different than zero? What is multicolinearity? Name one undesirable effect of
multicolinearity. Is multicolinearity a problem with your set of potential predictors?
(Provide evidence in the form of results provided by running function vif.m on the matrix
of your predictors – this matrix is Xc in the workspace after running geosa11).
pf3
pf4

Partial preview of the text

Download GEOS 585A, Assignment 11: Multiple Linear Regression and more Assignments Geology in PDF only on Docsity!

ASSIGNMENT 11. MULTIPLE LINEAR REGRESSION

  1. Run geosa11.m, selecting part I (assignment 11) from the first menu. This selection specifies multiple linear regression without cross-validation.

Menus appear allowing you 1) select time series to be used as predictand and predictors ,

  1. log-transform the predictand, 3) prewhiten the predictors, 4) include lagged predictors (lags up to t-2 or t+2 relative to the year of the predictand, and 5) specify a calibration period. Make choices that seem reasonable from your knowledge of the data and results of previous analyses. For this exercise, make sure your pool of potential predictors includes at least 6 variables. This can be achieved by various combinations of number of predictor series and lags. For example, two predictor series and lags -1 through +1 gives six potential predictors. Or six predictor time series and no lags also six potential predictors.

You will then begin an interactive modeling process that consists of stepwise entry of predictors one-by-one, with examination of model statistics and residuals analysis at each step. Note that at each step you must press “being modeling or add another predictor” , followed by “review results.” For this exercise, run the model out to 4 steps. Then press “Quit.” You should have 4 predictors are in your final model.

You will have four Figure windows. Answer the following questions and turn in your answers with printouts of the figure windows.

  1. (Caption to Fig 1) What is the explanatory power of the relationship? Refer to the R^2 , overall F of the regression, the p -value for F in explaining your answer. Does the F - value support a statistically significant relationship? Why is adjusted R^2 smaller than R^2?
  2. (Caption to Fig. 2) For which predictors are the regression coefficients significantly different than zero? What is multicolinearity? Name one undesirable effect of multicolinearity. Is multicolinearity a problem with your set of potential predictors? (Provide evidence in the form of results provided by running function vif.m on the matrix of your predictors – this matrix is Xc in the workspace after running geosa11).
  1. (Caption to Fig 3) What is the assumption on the form of the distribution of the

regression residuals? Does your analysis of residuals suggest that the assumption is satisfied? What can you say about the distribution of residuals for the extreme case of zero explanatory power of regression? (A “trick” question)

  1. (Caption to Fig 4) What is the assumption on autocorrelation of residuals? Do the

residuals violate this assumption. Refer the Portmanteau statistic, Durbin-Watson statistic, and visual appearance of acf of residuals in your answer.

Fig 4. Residuals analysis 2: autocorrelation of residuals. The top plot is a time series plot of residual. This plot is useful in pointing out possible trend in residuals over time, as well as tendency of large residuals to cluster. At lower left is a scatterplot of residuals at time t against residuals at time t-1, Ideally this scatterplot shows no dependence. A linear pattern might indicate first-order autocorrelation of residuals. At lower right is the acf of the residuals. Ideally, the acf is close to zero at lags. Annotated below the plots are the Portmanteau statistic and the Durbin-Watson test results.

  1. After reviewing the figures summarizing the regression for step 1, click on “Begin modeling or add another predictor” again, which enters a second predictor into the equation. Again browse through the four figures. Note how the explanatory power of the regression equation increases with the additional predictor.
  2. Repeat step 13 through as many steps as desired. Then press “Quit” to stop the stepwise entry of predictors

PROGRAMMING NOTES

geosa11.m relies heavily user-written functions, including:

armawht1 -- prewhitens time series with AR model crospul2 – builds pointer to rows of time series matrix for cross-validation lagyr3 – builds a matrix of lagged predictors menudm1 – miscellaneous menu function dwstat –Durbin-Watson statistic acf – autocorrelation function portmant – Portmanteau statistic rederr – reduction-or-error statistic stepvbl1 —stepwise entry of variables based on ability to reduce residual variance durbinwt.mat – lookup table for significance of D-W statistic sepred2 – standard error of prediction hatmtx – “hat matrix” mce1 – minimum coverage ellipsoid

Many of the above functions are not used until assignment 12, which brings cross-validation into the regression model.