Analyzing Lead Sales' Impact on Cord Blood Lead in Sociology 362, Exercises of Sociology

A data exercise for a sociology 362 class, focusing on analyzing the relationship between leaded gas sales and cord blood lead concentrations using stata. The exercise covers various statistical techniques such as univariate summary statistics, covariance and correlation analysis, time plots, and linear regression. Students are required to input the data into stata, calculate summary statistics, find correlations, create time plots, and perform linear regression to find the least-squares values of the regression line.

Typology: Exercises

2011/2012

Uploaded on 11/20/2012

shubnam
shubnam 🇮🇳

4.5

(6)

127 documents

1 / 2

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Sociology 362
data exercise 1
1. The following table shows, for a period of 14 months, gasoline lead sales (metric tons) in Massachusetts
and the mean lead (ug/dl) concentrations in umbilical-cord blood of babies born at a Boston hospital.
month leaded gas cord lead
1 141 6.4
2 166 6.1
3 161 5.7
4 170 6.9
5 148 7.0
6 136 7.2
7 169 6.6
8 109 5.7
9 117 5.7
10 87 5.3
11 105 4.9
12 73 5.4
13 82 4.5
14 75 6.0
a. Use the input varlist command to input these data into Stata, and then use save filespec to create a
.dta file in Stata internal format.
b. Find the usual univariate summary statistics for all the variables. One way to do this is with the
command summarize month cordld ldgas, where the words in italics are the names I gave to the
variables.
c. Find the covariance and correlation between all three variables. Try correlate month cord ldgas and the
same command tihe the option ,cov added after the variable names. What do these quantities tell us
about the relationships?
d. Graph the gas and cord series against month (”time plots”), and graph the cord series against the gas
series. Are these about what you would expect based on the correlations? How do these graphs manifest
the relative magnitudes of the correlations?
For the following problems, first do them “by hand,” that is, without using Stata’s regress or predict
commands. You can use Stata’s other commands to generate the five fundamental quantities needed to
find the least-squares solutions, or to produce vectors of fitted values, residuals, etc. In other words, you
can use Stata as a calculator. Once you’re done, then use regress and predict to check your answers.
e. Let ˆ
Yi=a+bXibe the line expressing cord (Y) as a linear function of ldgas (X). Find the
least-squares values of a and b.
f. Use the least-squares line (and perhaps Stata’s gen and list commands) to find all the fitted values ˆ
Yi
generated by the observed values of X.
g. Suppose you have done gen yhat = a + b*(ldgas), where yhat is the name you chose for the fitted Y’s
and ldgas is the name you gave to the lead gas series when you input the data. To see how the fitted line
looks on the scatterplot of cord lead against gas lead, try graph cordld yhat ldgas,two.
h. Find the average of the fitted ˆ
Yi(i.e., fitted cord lead values that we called yhat) and compare them to
the average of the observed Ys given above. This command will work: summarize cordld yhat.
1
docsity.com
pf2

Partial preview of the text

Download Analyzing Lead Sales' Impact on Cord Blood Lead in Sociology 362 and more Exercises Sociology in PDF only on Docsity!

Sociology 362 data exercise 1

  1. The following table shows, for a period of 14 months, gasoline lead sales (metric tons) in Massachusetts and the mean lead (ug/dl) concentrations in umbilical-cord blood of babies born at a Boston hospital.

month leaded gas cord lead 1 141 6. 2 166 6. 3 161 5. 4 170 6. 5 148 7. 6 136 7. 7 169 6. 8 109 5. 9 117 5. 10 87 5. 11 105 4. 12 73 5. 13 82 4. 14 75 6.

a. Use the input varlist command to input these data into Stata, and then use save filespec to create a .dta file in Stata internal format.

b. Find the usual univariate summary statistics for all the variables. One way to do this is with the command summarize month cordld ldgas, where the words in italics are the names I gave to the variables.

c. Find the covariance and correlation between all three variables. Try correlate month cord ldgas and the same command tihe the option ,cov added after the variable names. What do these quantities tell us about the relationships?

d. Graph the gas and cord series against month (”time plots”), and graph the cord series against the gas series. Are these about what you would expect based on the correlations? How do these graphs manifest the relative magnitudes of the correlations?

For the following problems, first do them “by hand,” that is, without using Stata’s regress or predict commands. You can use Stata’s other commands to generate the five fundamental quantities needed to find the least-squares solutions, or to produce vectors of fitted values, residuals, etc. In other words, you can use Stata as a calculator. Once you’re done, then use regress and predict to check your answers.

e. Let Yˆi = a + bXi be the line expressing cord (Y) as a linear function of ldgas (X). Find the least-squares values of a and b.

f. Use the least-squares line (and perhaps Stata’s gen and list commands) to find all the fitted values Yˆi generated by the observed values of X.

g. Suppose you have done gen yhat = a + b*(ldgas), where yhat is the name you chose for the fitted Y’s and ldgas is the name you gave to the lead gas series when you input the data. To see how the fitted line looks on the scatterplot of cord lead against gas lead, try graph cordld yhat ldgas,two.

h. Find the average of the fitted Yˆi (i.e., fitted cord lead values that we called yhat) and compare them to the average of the observed Ys given above. This command will work: summarize cordld yhat.

docsity.com

i. Find all the residuals ˆei. What is the average of the residuals? Interpret the residual corresponding to the observed Y generated by X 2.

j. Find the correlations between the fitted Yˆi (i.e., yhat), the residuals ˆei, and the observed Xi (i.e., ldgas). Explain these correlations.

k. Find the sum of squares total, the sum of squares regression and the sum of squares residual, and compute R^2. Interpret R^2.

l. Find the mean square error and root mean square (aka, standard error of estimate) for the fitted model.

An easy set of commands to do the regression and get the fitted values and residuals is:

regress cord ldgas predict yhat predict ehat,res

docsity.com