OLS Regression Assignment for Poli 784, Spring 2009, Assignments of Political Science

An assignment for a statistics course where students are required to estimate an ols regression model using r. The assignment involves loading data from an rdata file, calculating coefficient estimates, standard errors, t-scores, p-values, model r-squared, adjusted r-squared, overall model f-test, and p-value for the f-test. Students must use matrix operations to compute the coefficient estimates and their standard errors. The document also provides some helpful r functions and operations for the students.

Typology: Assignments

Pre 2010

Uploaded on 03/16/2009

koofers-user-mn3-1
koofers-user-mn3-1 🇺🇸

8 documents

1 / 2

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Assignment #4
Poli 784, Spring, 2009 (Carsey)
Due: February 10th at the start of class
For this assignment, you will estimate an OLS regression model using R. You will read in the
data from an RData file posted on the course website (Use the “Load Workspace” option on the
“File” pull down menu in R). The data set includes four objects: y, x1, x2, and x3. Each of them
is just a string of 500 numbers that you can think of as vectors that contain data. [NOTE: you
can always see what objects are currently in available in your R session by typing objects()
or ls() – the latter is short for “list”]
Your job is to program in a script file all of the calculations needed to produced the following:
the coefficient estimates, their standard errors, t-scores, p-values for those t-scores (2-tailed), the
model R-squared, the adjusted R-Squared, the overall model F-test, the p-value for that F-test,
the min, max, mean, median, first quartile, and third quartile of the residuals, and the residual
standard error. NOTE: The residual standard error is just the square root of the variance of the
errors, which itself is normally represented as σ2. To be clear, you need to program the formulas
for all of these calculations (You will use R functions, however, to help you generate p-values).
You MUST use matrix operations to compute the coefficient estimates and their standard errors.
Make sure you script file prints this out. Note: use the t-scores to test the conventional Null
hypothesis about each individual coefficient, and the F-test to test the conventional Null
hypothesis of the model.
Finally, use the lm() function to run a model where you regress y on x1, x2, and x3 in order to
check your work.
Turn in your complete script file, the output it produces, and the output produced by the lm()
function. Also, interpret the regression output completely and include that in what you turn in.
R Help
Here are some helpful operations in R:
To multiply Matrix A by Matrix B: A %*% B
To transpose Matrix A: t(A)
To compute the Inverse of Matrix A: solve(A)
To combine a column of ones with the “x” variables into one object: X <- cbind(1, x1, x2, x3)
There are other matrix operators/functions available in R that might be helpful for you to
consider. I would start with reading the section in the “An Introduction to R” manual that deals
with matrices and arrays. The very first reference card noted in the syllabus is also helpful, but
look for others too.
Finally, it can be helpful to collect results from operations in R at the end of your script file and
print them to the screen together. It can also be helpful to include text with the objects you print
that help you identify them. Let’s say you compute something called Sigma2 and you want to
pf2

Partial preview of the text

Download OLS Regression Assignment for Poli 784, Spring 2009 and more Assignments Political Science in PDF only on Docsity!

Assignment # Poli 784, Spring, 2009 (Carsey) Due: February 10 th at the start of class For this assignment, you will estimate an OLS regression model using R. You will read in the data from an RData file posted on the course website (Use the “Load Workspace” option on the “File” pull down menu in R). The data set includes four objects: y, x1, x2, and x3. Each of them is just a string of 500 numbers that you can think of as vectors that contain data. [NOTE: you can always see what objects are currently in available in your R session by typing objects() or ls() – the latter is short for “list”] Your job is to program in a script file all of the calculations needed to produced the following: the coefficient estimates, their standard errors, t-scores, p-values for those t-scores (2-tailed), the model R-squared, the adjusted R-Squared, the overall model F-test, the p-value for that F-test, the min, max, mean, median, first quartile, and third quartile of the residuals, and the residual standard error. NOTE: The residual standard error is just the square root of the variance of the errors, which itself is normally represented as σ 2

. To be clear, you need to program the formulas for all of these calculations (You will use R functions, however, to help you generate p-values). You MUST use matrix operations to compute the coefficient estimates and their standard errors. Make sure you script file prints this out. Note: use the t-scores to test the conventional Null hypothesis about each individual coefficient, and the F-test to test the conventional Null hypothesis of the model. Finally, use the lm() function to run a model where you regress y on x1, x2, and x3 in order to check your work. Turn in your complete script file, the output it produces, and the output produced by the lm() function. Also, interpret the regression output completely and include that in what you turn in. R Help Here are some helpful operations in R: To multiply Matrix A by Matrix B: A %*% B To transpose Matrix A: t(A) To compute the Inverse of Matrix A: solve(A) To combine a column of ones with the “x” variables into one object: X <- cbind(1, x1, x2, x3) There are other matrix operators/functions available in R that might be helpful for you to consider. I would start with reading the section in the “An Introduction to R” manual that deals with matrices and arrays. The very first reference card noted in the syllabus is also helpful, but look for others too. Finally, it can be helpful to collect results from operations in R at the end of your script file and print them to the screen together. It can also be helpful to include text with the objects you print that help you identify them. Let’s say you compute something called Sigma2 and you want to

print it and label it. Here is one way to do it: ># I ASSUME YOU DEFINED Sigma2 SOMEWHERE EARLIER IN YOUR SCRIPT FILE > cat(“My Sigma-Squared = ”,Signa2,”\n”) The cat() function just turns everything inside it into characters, connects them together (concatenates them) and prints the result to the screen. Objects to be combined are separated by commas. Text you want to type is included in double quotes, including any spaces you want to leave. The final item in quotes “\n” is a line return operator. You can think of it as telling R to hit “Enter” on the screen so the next printed output starts on a new line. NOTE: the cat() function is NOT the same as the c() function. Experiment with the two of them and you will see. All of the formulas you need for all of the other calculations are in the textbook and have been presented in class. HINTS (with a few Bonus questions)

  1. I suggest you set global variables for N and K using characteristics of the data matrix ( Bonus: How can this be done so that any matrix of data could be analyzed with your script file without having to change what N and/or K are defined as? ).
  2. If you use matrix operations to compute the sum of the squared errors, R will return the result as a 1x1 matrix. In order to use this result in later calculations, it will be more convenient to have it as a scalar rather than a 1x1 matrix ( Bonus: explain why this is true ). You can do this either by making reference to element [1,1] of this matrix, or you can place the entire calculation within the c( ) function to convert it to a simple scalar. So, let’s suppose you use matrix algebra to compute RSS. You could put that entire calculation within the c( ) function. You could also just reassign RSS like this: > RSS <- RSS[1,1] or > RSS <- c(RSS)
  3. To evaluate a t-score against the t distribution, use the following: 1 - pt(T,df) In this expression, “T” is the t-score you want to evaluate (needs to be a positive number) and “df” is the degrees of freedom for the test. This will return the p-value for a one-tailed test. You have to figure out how to get the p-value for a 2-tailed test.
  4. To evaluate the F-score against the F distribution, use: 1 - pf(F,df1,df2) In this expression, “F” is the F-score you want to evaluate, “df1” is the first degrees of freedom, and “df2” is the second degrees of freedom. This will return the p-value for the upper tail of the F-distribution.