
GSR 516 Summer 2008
Homework Assignment 5
Due July 22, 2008
Scatterplots and Correlation
1. Data from a British government survey of household spending may be used to
examine the relationship between household spending on tobacco products and
alcholic beverages. Data were collected from the 11 regions of Great Britain.
The data are available from WebCT in the data file DASL_alcohol_tobacco_data.xls.
The original Web site is:
http://lib.stat.cmu.edu/DASL/Datafiles/AlcoholandTobacco.html.
a) Construct a scatterplot to represent the association between household spending
on alcohol and household spending on tobacco across the 11 regions.
b) Is a line an appropriate model for the data? Explain your reasoning.
c) Regardless of your answer in part (b), compute the correlation between alcohol
and tobacco spending. Characterize the correlation as strong, weak, or moderate,
and as negative or positive.
d) Remove the obvious outlier and recalculate the correlation between alcohol and
tobacco spending. Characterize the correlation as strong, weak, or moderate, and
as negative or positive.
e) Choose one of the two correlations in parts (c) and (d) above, depending on
whether you believe that the outlier should be included or not. Conduct a
statistical test to determine if the data are consistent with a correlation of zero.
Please include all of the important elements in a statistical test.
Optional Work
Identify another dataset containing two paired quantitative variables. Construct a
scatterplot for the data, compute and characterize the correlation coefficient, and
perform a statistical test against the null hypothesis that the true correlation is zero.
You may wish to use data from the diet study or the chapped lips study, which you
may have already analyzed in this course. You may also search the Data and Story
Library (DASL, Web site: http://lib.stat.cmu.edu/DASL/DataArchive.html), or search
the Internet or other sources for data.
1 of 2