Scatterplots and Correlation Homework Assignment - Summer 2008, Assignments of Probability and Statistics

Instructions for a homework assignment focused on analyzing the relationship between two quantitative variables using scatterplots and correlation coefficients. The assignment involves examining the association between household spending on alcohol and tobacco, as well as testing the reliability of test scores. Students are required to construct scatterplots, compute correlations, and perform statistical tests to determine if the data are consistent with a correlation of zero.

Typology: Assignments

Pre 2010

Uploaded on 08/16/2009

koofers-user-p3h
koofers-user-p3h 🇺🇸

10 documents

1 / 2

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
GSR 516 Summer 2008
Homework Assignment 5
Due July 22, 2008
Scatterplots and Correlation
1. Data from a British government survey of household spending may be used to
examine the relationship between household spending on tobacco products and
alcholic beverages. Data were collected from the 11 regions of Great Britain.
The data are available from WebCT in the data file DASL_alcohol_tobacco_data.xls.
The original Web site is:
http://lib.stat.cmu.edu/DASL/Datafiles/AlcoholandTobacco.html.
a) Construct a scatterplot to represent the association between household spending
on alcohol and household spending on tobacco across the 11 regions.
b) Is a line an appropriate model for the data? Explain your reasoning.
c) Regardless of your answer in part (b), compute the correlation between alcohol
and tobacco spending. Characterize the correlation as strong, weak, or moderate,
and as negative or positive.
d) Remove the obvious outlier and recalculate the correlation between alcohol and
tobacco spending. Characterize the correlation as strong, weak, or moderate, and
as negative or positive.
e) Choose one of the two correlations in parts (c) and (d) above, depending on
whether you believe that the outlier should be included or not. Conduct a
statistical test to determine if the data are consistent with a correlation of zero.
Please include all of the important elements in a statistical test.
Optional Work
Identify another dataset containing two paired quantitative variables. Construct a
scatterplot for the data, compute and characterize the correlation coefficient, and
perform a statistical test against the null hypothesis that the true correlation is zero.
You may wish to use data from the diet study or the chapped lips study, which you
may have already analyzed in this course. You may also search the Data and Story
Library (DASL, Web site: http://lib.stat.cmu.edu/DASL/DataArchive.html), or search
the Internet or other sources for data.
1 of 2
pf2

Partial preview of the text

Download Scatterplots and Correlation Homework Assignment - Summer 2008 and more Assignments Probability and Statistics in PDF only on Docsity!

GSR 516 Summer 2008

Homework Assignment 5

Due July 22, 2008

Scatterplots and Correlation

  1. Data from a British government survey of household spending may be used to examine the relationship between household spending on tobacco products and alcholic beverages. Data were collected from the 11 regions of Great Britain. The data are available from WebCT in the data file DASL_alcohol_tobacco_data.xls. The original Web site is: http://lib.stat.cmu.edu/DASL/Datafiles/AlcoholandTobacco.html. a) Construct a scatterplot to represent the association between household spending on alcohol and household spending on tobacco across the 11 regions. b) Is a line an appropriate model for the data? Explain your reasoning. c) Regardless of your answer in part (b), compute the correlation between alcohol and tobacco spending. Characterize the correlation as strong, weak, or moderate, and as negative or positive. d) Remove the obvious outlier and recalculate the correlation between alcohol and tobacco spending. Characterize the correlation as strong, weak, or moderate, and as negative or positive. e) Choose one of the two correlations in parts (c) and (d) above, depending on whether you believe that the outlier should be included or not. Conduct a statistical test to determine if the data are consistent with a correlation of zero. Please include all of the important elements in a statistical test. Optional Work Identify another dataset containing two paired quantitative variables. Construct a scatterplot for the data, compute and characterize the correlation coefficient, and perform a statistical test against the null hypothesis that the true correlation is zero. You may wish to use data from the diet study or the chapped lips study, which you may have already analyzed in this course. You may also search the Data and Story Library (DASL, Web site: http://lib.stat.cmu.edu/DASL/DataArchive.html), or search the Internet or other sources for data. 1 of 2

Reliability

  1. Consider the data in the Excel file called test_retest_data.xls, which is stored on the course WebCT “Datasets and Contexts” site. a) Examine the columns titled “Test” and “Retest.” These columns represent the test and retest scores for a sample of n = 23 respondents. b) Construct a scatterplot of the test and retest scores. Make the retest scores the dependent variable. c) Compute the correlation between the test and retest scores. d) Conduct a statistical test to determine if the correlation between the test and restest scores is different from zero. e) Write an overall conclusion explaining whether you feel that the test/retest reliability for these data is adequate or not. Refer to specific statistical evidence to justify your answer. Optional Work Conduct a matched pairs t -test on the differences between the scores at the two times.
  2. Consider the dataset of individual judge scores for the 2004 Olympic men’s horizontal bar event. The data are stored in the Excel file 2004_Olympic_mens_horizontal_bar_scores_data.xls. a) Construct a scatterplot matrix for the seven judge scores columns in the dataset. b) Compute the matrix of correlations between the seven judge score columns in the dataset. c) Identify the judge score columns associated with the strongest and weakest correlations. d) Compute Cronbach’s Alpha for the seven judge score columns in the dataset. Explain whether the value of Cronbach’s Alpha is large enough to justify considering the variables as representing one underlying construct, presumably quality of performance on the horizontal bar. Optional Work Identify other test/retest or reliability data. Carry out similar steps on these data. A good Web site for obtaining judge scores for gymnastic events is http://www.gymnasticsresults.com/olympics.html. 2 of 2