



Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
An analysis of the relationship between a person's education level and their annual income using a linear regression model. the collected data, the linear regression equation, and instructions for calculating the predicted values and residuals for each individual in the dataset.
Typology: Study notes
1 / 7
This page cannot be seen from the preview
Don't miss anything!




This homework is due by 5PM ET on Friday, March 26. Please use this R Markdown template to report your code, output, and written answers in a single document. Turn in your homework as a pdf on NYU Classes. Comment your code. Report results in the correct units of measurement. Do not report more than two digits to the right of the decimal point.
Name:
TA:
What determines a person’s earnings in the labor market? Why do different people earn different incomes?
One possible determining factor of a person’s earnings is her level of education. To investigate this relation- ship, researchers collected the data on income and education for 10 individuals. Education is measured in years of schooling, and annual income is measured in thousands of US dollars. Table 1 presents the collected data.
id Annual Income ($K) Education (years) 1 44 6 2 45 7 3 42 9 4 56 9 5 72 10 6 70 14 7 63 13 8 38 8 9 45 7 10 62 11
The linear model that captures the relationship between education and earnings is given by:
Ii = α + β ∗ Ei + i
Where Ii represents person’s i annual income; Ei represents person’s i level of education in years, and i is the prediction error for person i.
Using the data collected by the research team, we estimated α ˆ and β ˆ fitting a linear regression in R.
After fitting the linear regression model with R, you learn that the estimated coefficients are α ˆ = 18_._ 57 and β^ ˆ = 3_._ 74. You can use these estimated coefficients to plug them back into your original equation and write your linear model as follows:
I^ ˆ i = 18_._ 57 + 3_._ 74 ∗ Ei
The linear model and data are plotted in Figure 1 for your reference.
Interpret the estimated coefficients α ˆ and β ˆ substantively. What do they mean in this particular instance?
Hint: Remember to use the appropriate units in your answer (that is, the units in which each variable is measured) when answering the question.
Type your written answer here:
Using the estimated α ˆ and β ˆ, you can obtain the predicted value of I ˆ i for each individual i with education of Ei in your sample and compare it to the observed level of Ii for that same person. The difference between Ii and I ˆ i (ˆ = Ii − I ˆ i in this case) is called the residual or prediction error.
Recall that α ˆ = 18_._ 57 and β ˆ = 3_._ 74
For each of the observations in the dataset you will do the following:
a) Write down the formula to obtain the predicted value of I ˆ i b) Compute the predicted value of I ˆ i c) Write down the formula to obtain the residual or prediction error ˆ i d) Compute the residual or prediction error
Hint: For this exercise you do not need to compute anything with R, you just need to use the information provided in Table 1 and the estimates of α ˆ and β ˆ provided.
Report a), b), c), and d) for individual with id = 1
Type your written answer here:
Report a), b), c), and d) for individual with id = 2
Type your written answer here:
Report a), b), c), and d) for individual with id = 3
Report a), b), c), and d) for individual with id = 9
Type your written answer here:
Report a), b), c), and d) for individual with id = 10
Type your written answer here:
You want to assess the theory that individuals are more likely to vote in elections featuring candidates who are of the same race as themselves. To look for evidence, you collect data on Black voter turnout and Black candidates in U.S. election districts. The data is stored in blackturnout.csv and the variables are described below:
[Note: the following data has been modified for pedagogical purposes.]
Name Description year Year in which election was held state State in which election was held district District in which election was held (unique within state but not across states) turnout Proportion of the Black voting age population in a district that voted in election BVAP Proportion of district’s voting age population that is Black bcandidate Indicator variable for whether a Black candidate runs in an election (1) or not (0)
Set your working directory and load the data. Check the structure of the data using the function str(). How many observations does the data have? How many variables? What is the unit of observation?
##insert code here
Insert written answer here
Using a frequency table show which years are included in the dataset. Print the table. Our data contains information from which years? Using the function prop.table() show the proportion of all observations that come from each state. What proportion of observations are from Texas (TX)?
##insert code here
Type your written answer here:
Create a scatter plot of Black turnout and Black voting age population, with Black turnout on the Y axis and Black voting age population on the X axis. Give meaningful labels to the Y and X axes, and provide a meaningful title. Describe the relationship between the two variables.
Hint: A good way to describe a visual relationship is to say whether it is strong or weak and if you can see the direction of the relationship.
##insert code here
Repeat the scatter plot you created in the previous question (Black turnout vs. Black voting age population) but plot it with points of two different colors: BLUE DOTS should represent observations where the elections included a Black candidate and RED DOTS should represent elections where none of the candidates were Black. Label both axes meaningfully and include a plot title. What does this plot tell you about the relationship between Black candidates and Black turnout?
##insert code here
Insert written answer here.
Fit a linear regression using Black turnout as your outcome variable, and the presence of a Black candidate as your predictor variable. Report the coefficient on your predictor and the intercept using the coefficients() function.
Interpret the two coefficients. Do not merely comment on the direction of the association (i.e., whether the slope is positive or negative). Explain what the values of the coefficients mean in terms of the units in which each variable is measured. Based on these coefficients, what would you conclude about the relationship between the presence of Black candidates and the level of Black voter turnout?
Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.