Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Linear Regression Analysis: Relationship between Education and Earnings, Study notes of Statistics

Saba University School of Medicine Statistics

An analysis of the relationship between a person's education level and their annual income using a linear regression model. the collected data, the linear regression equation, and instructions for calculating the predicted values and residuals for each individual in the dataset.

Typology: Study notes

2020/2021

Uploaded on 04/04/2021

leo-duan 🇧🇶

1 document

1 / 7

This page cannot be seen from the preview

Don't miss anything!

POL 850 Spring 2021 - HW 3

This homework is due by 5PM ET on Friday, March 26. Please use this R Markdown template to report

your code, output, and written answers in a single document. Turn in your homework as a pdf on NYU

Classes. Comment your code. Report results in the correct units of measurement. Do not report more than

two digits to the right of the decimal point.

Name:

TA:

Key Concepts in Linear Regression

What determines a person’s earnings in the labor market? Why do different people earn different incomes?

One possible determining factor of a person’s earnings is her level of education. To investigate this relation-

ship, researchers collected the data on income and education for 10 individuals. Education is measured in

years of schooling, and annual income is measured in thousands of US dollars. Table 1 presents the collected

data.

id Annual Income ($K) Education (years)

1 44 6

2 45 7

3 42 9

4 56 9

5 72 10

6 70 14

7 63 13

8 38 8

9 45 7

10 62 11

The linear model that captures the relationship between education and earnings is given by:

Ii=α+β∗Ei+i

Where Iirepresents person’s iannual income; Eirepresents person’s ilevel of education in years, and iis

the prediction error for person i.

Using the data collected by the research team, we estimated ˆαand ˆ

βfitting a linear regression in R.

After fitting the linear regression model with R, you learn that the estimated coefficients are ˆα= 18.57 and

ˆ

β= 3.74. You can use these estimated coefficients to plug them back into your original equation and write

your linear model as follows:

ˆ

Ii= 18.57 + 3.74 ∗Ei

The linear model and data are plotted in Figure 1 for your reference.

1

Discover Study notes of Statistics Saba University School of Medicine

Partial preview of the text

Download Linear Regression Analysis: Relationship between Education and Earnings and more Study notes Statistics in PDF only on Docsity!

POL 850 Spring 2021 - HW 3

This homework is due by 5PM ET on Friday, March 26. Please use this R Markdown template to report your code, output, and written answers in a single document. Turn in your homework as a pdf on NYU Classes. Comment your code. Report results in the correct units of measurement. Do not report more than two digits to the right of the decimal point.

Name:

TA:

Key Concepts in Linear Regression

What determines a person’s earnings in the labor market? Why do different people earn different incomes?

One possible determining factor of a person’s earnings is her level of education. To investigate this relation- ship, researchers collected the data on income and education for 10 individuals. Education is measured in years of schooling, and annual income is measured in thousands of US dollars. Table 1 presents the collected data.

id Annual Income ($K) Education (years) 1 44 6 2 45 7 3 42 9 4 56 9 5 72 10 6 70 14 7 63 13 8 38 8 9 45 7 10 62 11

The linear model that captures the relationship between education and earnings is given by:

Ii = α + β ∗ Ei + i

Where Ii represents person’s i annual income; Ei represents person’s i level of education in years, and i is the prediction error for person i.

Using the data collected by the research team, we estimated α ˆ and β ˆ fitting a linear regression in R.

After fitting the linear regression model with R, you learn that the estimated coefficients are α ˆ = 18_._ 57 and β^ ˆ = 3_._ 74. You can use these estimated coefficients to plug them back into your original equation and write your linear model as follows:

I^ ˆ i = 18_._ 57 + 3_._ 74 ∗ Ei

The linear model and data are plotted in Figure 1 for your reference.

Question 1 (10 points)

Interpret the estimated coefficients α ˆ and β ˆ substantively. What do they mean in this particular instance?

Hint: Remember to use the appropriate units in your answer (that is, the units in which each variable is measured) when answering the question.

Answer 1

Type your written answer here:

Question 2

Using the estimated α ˆ and β ˆ, you can obtain the predicted value of I ˆ i for each individual i with education of Ei in your sample and compare it to the observed level of Ii for that same person. The difference between Ii and I ˆ i (ˆ = Ii − I ˆ i in this case) is called the residual or prediction error.

Recall that α ˆ = 18_._ 57 and β ˆ = 3_._ 74

For each of the observations in the dataset you will do the following:

a) Write down the formula to obtain the predicted value of I ˆ i b) Compute the predicted value of I ˆ i c) Write down the formula to obtain the residual or prediction error ˆ i d) Compute the residual or prediction error

Hint: For this exercise you do not need to compute anything with R, you just need to use the information provided in Table 1 and the estimates of α ˆ and β ˆ provided.

Question 2.1 (4 points)

Report a), b), c), and d) for individual with id = 1

Answer 2.

Type your written answer here:

Question 2.2 (4 points)

Report a), b), c), and d) for individual with id = 2

Answer 2.

Type your written answer here:

Question 2.3 (4 points)

Report a), b), c), and d) for individual with id = 3

Question 2.9 (4 points)

Report a), b), c), and d) for individual with id = 9

Answer 2.

Type your written answer here:

Question 2.10 (4 points)

Report a), b), c), and d) for individual with id = 10

Answer 2.

Type your written answer here:

Question 3: Candidates’ Race and Voter Turnout

You want to assess the theory that individuals are more likely to vote in elections featuring candidates who are of the same race as themselves. To look for evidence, you collect data on Black voter turnout and Black candidates in U.S. election districts. The data is stored in blackturnout.csv and the variables are described below:

[Note: the following data has been modified for pedagogical purposes.]

Name Description year Year in which election was held state State in which election was held district District in which election was held (unique within state but not across states) turnout Proportion of the Black voting age population in a district that voted in election BVAP Proportion of district’s voting age population that is Black bcandidate Indicator variable for whether a Black candidate runs in an election (1) or not (0)

Question 3.1 (5 points)

Set your working directory and load the data. Check the structure of the data using the function str(). How many observations does the data have? How many variables? What is the unit of observation?

Answer 3.

##insert code here

Insert written answer here

Question 3.2 (5 points)

Using a frequency table show which years are included in the dataset. Print the table. Our data contains information from which years? Using the function prop.table() show the proportion of all observations that come from each state. What proportion of observations are from Texas (TX)?

##insert code here

Type your written answer here:

Question 3.3 (10 points)

Create a scatter plot of Black turnout and Black voting age population, with Black turnout on the Y axis and Black voting age population on the X axis. Give meaningful labels to the Y and X axes, and provide a meaningful title. Describe the relationship between the two variables.

Hint: A good way to describe a visual relationship is to say whether it is strong or weak and if you can see the direction of the relationship.

Answer 3.

##insert code here

Question 3.4 (15 points)

Repeat the scatter plot you created in the previous question (Black turnout vs. Black voting age population) but plot it with points of two different colors: BLUE DOTS should represent observations where the elections included a Black candidate and RED DOTS should represent elections where none of the candidates were Black. Label both axes meaningfully and include a plot title. What does this plot tell you about the relationship between Black candidates and Black turnout?

Answer 3.

##insert code here

Insert written answer here.

Question 3.5 (15 points)

Fit a linear regression using Black turnout as your outcome variable, and the presence of a Black candidate as your predictor variable. Report the coefficient on your predictor and the intercept using the coefficients() function.

Interpret the two coefficients. Do not merely comment on the direction of the association (i.e., whether the slope is positive or negative). Explain what the values of the coefficients mean in terms of the units in which each variable is measured. Based on these coefficients, what would you conclude about the relationship between the presence of Black candidates and the level of Black voter turnout?

temperature

pressure

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.

Linear Regression Analysis: Relationship between Education and Earnings, Study notes of Statistics

Related documents

Partial preview of the text

Download Linear Regression Analysis: Relationship between Education and Earnings and more Study notes Statistics in PDF only on Docsity!

POL 850 Spring 2021 - HW 3

Key Concepts in Linear Regression

Question 1 (10 points)

Answer 1

Question 2

Question 2.1 (4 points)

Answer 2.

Question 2.2 (4 points)

Answer 2.

Question 2.3 (4 points)

Question 2.9 (4 points)

Answer 2.

Question 2.10 (4 points)

Answer 2.

Question 3: Candidates’ Race and Voter Turnout

Question 3.1 (5 points)

Answer 3.

Question 3.2 (5 points)

Question 3.3 (10 points)

Answer 3.

Question 3.4 (15 points)

Answer 3.

Question 3.5 (15 points)

temperature

pressure