OLS Regression Assignment for Poli 784, Spring 2009 | Assignments Political Science

Assignment #4

Poli 784, Spring, 2009 (Carsey)

Due: February 10th at the start of class

For this assignment, you will estimate an OLS regression model using R. You will read in the

data from an RData file posted on the course website (Use the “Load Workspace” option on the

“File” pull down menu in R). The data set includes four objects: y, x1, x2, and x3. Each of them

is just a string of 500 numbers that you can think of as vectors that contain data. [NOTE: you

can always see what objects are currently in available in your R session by typing objects()

or ls() – the latter is short for “list”]

Your job is to program in a script file all of the calculations needed to produced the following:

the coefficient estimates, their standard errors, t-scores, p-values for those t-scores (2-tailed), the

model R-squared, the adjusted R-Squared, the overall model F-test, the p-value for that F-test,

the min, max, mean, median, first quartile, and third quartile of the residuals, and the residual

standard error. NOTE: The residual standard error is just the square root of the variance of the

errors, which itself is normally represented as σ2. To be clear, you need to program the formulas

for all of these calculations (You will use R functions, however, to help you generate p-values).

You MUST use matrix operations to compute the coefficient estimates and their standard errors.

Make sure you script file prints this out. Note: use the t-scores to test the conventional Null

hypothesis about each individual coefficient, and the F-test to test the conventional Null

hypothesis of the model.

Finally, use the lm() function to run a model where you regress y on x1, x2, and x3 in order to

check your work.

Turn in your complete script file, the output it produces, and the output produced by the lm()

function. Also, interpret the regression output completely and include that in what you turn in.

R Help

Here are some helpful operations in R:

To multiply Matrix A by Matrix B: A %*% B

To transpose Matrix A: t(A)

To compute the Inverse of Matrix A: solve(A)

To combine a column of ones with the “x” variables into one object: X <- cbind(1, x1, x2, x3)

There are other matrix operators/functions available in R that might be helpful for you to

consider. I would start with reading the section in the “An Introduction to R” manual that deals

with matrices and arrays. The very first reference card noted in the syllabus is also helpful, but

look for others too.

Finally, it can be helpful to collect results from operations in R at the end of your script file and

print them to the screen together. It can also be helpful to include text with the objects you print

that help you identify them. Let’s say you compute something called Sigma2 and you want to

Partial preview of the text

Download OLS Regression Assignment for Poli 784, Spring 2009 and more Assignments Political Science in PDF only on Docsity!

Assignment # Poli 784, Spring, 2009 (Carsey) Due: February 10 th at the start of class For this assignment, you will estimate an OLS regression model using R. You will read in the data from an RData file posted on the course website (Use the “Load Workspace” option on the “File” pull down menu in R). The data set includes four objects: y, x1, x2, and x3. Each of them is just a string of 500 numbers that you can think of as vectors that contain data. [NOTE: you can always see what objects are currently in available in your R session by typing objects() or ls() – the latter is short for “list”] Your job is to program in a script file all of the calculations needed to produced the following: the coefficient estimates, their standard errors, t-scores, p-values for those t-scores (2-tailed), the model R-squared, the adjusted R-Squared, the overall model F-test, the p-value for that F-test, the min, max, mean, median, first quartile, and third quartile of the residuals, and the residual standard error. NOTE: The residual standard error is just the square root of the variance of the errors, which itself is normally represented as σ 2

. To be clear, you need to program the formulas for all of these calculations (You will use R functions, however, to help you generate p-values). You MUST use matrix operations to compute the coefficient estimates and their standard errors. Make sure you script file prints this out. Note: use the t-scores to test the conventional Null hypothesis about each individual coefficient, and the F-test to test the conventional Null hypothesis of the model. Finally, use the lm() function to run a model where you regress y on x1, x2, and x3 in order to check your work. Turn in your complete script file, the output it produces, and the output produced by the lm() function. Also, interpret the regression output completely and include that in what you turn in. R Help Here are some helpful operations in R: To multiply Matrix A by Matrix B: A %*% B To transpose Matrix A: t(A) To compute the Inverse of Matrix A: solve(A) To combine a column of ones with the “x” variables into one object: X <- cbind(1, x1, x2, x3) There are other matrix operators/functions available in R that might be helpful for you to consider. I would start with reading the section in the “An Introduction to R” manual that deals with matrices and arrays. The very first reference card noted in the syllabus is also helpful, but look for others too. Finally, it can be helpful to collect results from operations in R at the end of your script file and print them to the screen together. It can also be helpful to include text with the objects you print that help you identify them. Let’s say you compute something called Sigma2 and you want to

print it and label it. Here is one way to do it: ># I ASSUME YOU DEFINED Sigma2 SOMEWHERE EARLIER IN YOUR SCRIPT FILE > cat(“My Sigma-Squared = ”,Signa2,”\n”) The cat() function just turns everything inside it into characters, connects them together (concatenates them) and prints the result to the screen. Objects to be combined are separated by commas. Text you want to type is included in double quotes, including any spaces you want to leave. The final item in quotes “\n” is a line return operator. You can think of it as telling R to hit “Enter” on the screen so the next printed output starts on a new line. NOTE: the cat() function is NOT the same as the c() function. Experiment with the two of them and you will see. All of the formulas you need for all of the other calculations are in the textbook and have been presented in class. HINTS (with a few Bonus questions)

I suggest you set global variables for N and K using characteristics of the data matrix ( Bonus: How can this be done so that any matrix of data could be analyzed with your script file without having to change what N and/or K are defined as? ).
If you use matrix operations to compute the sum of the squared errors, R will return the result as a 1x1 matrix. In order to use this result in later calculations, it will be more convenient to have it as a scalar rather than a 1x1 matrix ( Bonus: explain why this is true ). You can do this either by making reference to element [1,1] of this matrix, or you can place the entire calculation within the c( ) function to convert it to a simple scalar. So, let’s suppose you use matrix algebra to compute RSS. You could put that entire calculation within the c( ) function. You could also just reassign RSS like this: > RSS <- RSS[1,1] or > RSS <- c(RSS)
To evaluate a t-score against the t distribution, use the following: 1 - pt(T,df) In this expression, “T” is the t-score you want to evaluate (needs to be a positive number) and “df” is the degrees of freedom for the test. This will return the p-value for a one-tailed test. You have to figure out how to get the p-value for a 2-tailed test.
To evaluate the F-score against the F distribution, use: 1 - pf(F,df1,df2) In this expression, “F” is the F-score you want to evaluate, “df1” is the first degrees of freedom, and “df2” is the second degrees of freedom. This will return the p-value for the upper tail of the F-distribution.

OLS Regression Assignment for Poli 784, Spring 2009, Assignments of Political Science

Related documents

Partial preview of the text

Download OLS Regression Assignment for Poli 784, Spring 2009 and more Assignments Political Science in PDF only on Docsity!