Two-Sample Hypothesis Testing Lab: T-Tests and Variance Analysis, Lab Reports of Statistics

Instructions for performing two-sample hypothesis tests using r software and the 't.test' function. The lab covers independent samples t-tests for the difference in population means, one-sided and two-sided tests, and the test for homogeneity of variances. Students are expected to read the document, complete the lab exercises, and write out null and alternative hypotheses, significance levels, and conclusions.

Typology: Lab Reports

Pre 2010

Uploaded on 07/30/2009

koofers-user-pq1
koofers-user-pq1 🇺🇸

10 documents

1 / 4

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Lab 3 STAT 3000
Two Sample Hypothesis Testing
Importing Data (repeat):
First download the kudzu dataset and put it in a familiar directory
(create one if you need to, or put it on the desktop).
Start R (if it's not already started) and click on the 'File' drop down
menu at the top left. Choose 'Change Directory' and then browse to
find the directory (or folder) where you stored the data file. Click on
that directory and then click 'Ok'.
Now R knows to look for the data file you will soon be referring to in
the appropriate folder.
In order to get that data into R, use the following commands:
> kudzu.df=read.csv("kudzu.csv")
> kudzu.df
Notice that the kudzu data are in two columns (with fairly long
headings) and the first column contains only 20 numbers and then a
sequence of "NA". That is because for the first sample, there are only
20 observations, whereas the second column or sample has a total of
25 observations.
To make the dataset more manageable, rename the column headings
to something shorter:
> names(kudzu.df)=c("without","with")
> kudzu.df
Notice that the names have now been changed. See page 384 of the
textbook for a description of this dataset.
Two Sample t-Tests for Independent Samples: The built-in
function called 't.test' can also perform calculations for hypothesis
tests for the difference in population means.
pf3
pf4

Partial preview of the text

Download Two-Sample Hypothesis Testing Lab: T-Tests and Variance Analysis and more Lab Reports Statistics in PDF only on Docsity!

Lab 3 STAT 3000

Two Sample Hypothesis Testing

  • Importing Data (repeat) : First download the kudzu dataset and put it in a familiar directory (create one if you need to, or put it on the desktop). Start R (if it's not already started) and click on the 'File' drop down menu at the top left. Choose 'Change Directory' and then browse to find the directory (or folder) where you stored the data file. Click on that directory and then click 'Ok'. Now R knows to look for the data file you will soon be referring to in the appropriate folder. In order to get that data into R, use the following commands:

    kudzu.df=read.csv("kudzu.csv") kudzu.df Notice that the kudzu data are in two columns (with fairly long headings) and the first column contains only 20 numbers and then a sequence of "NA". That is because for the first sample, there are only 20 observations, whereas the second column or sample has a total of 25 observations. To make the dataset more manageable, rename the column headings to something shorter: names(kudzu.df)=c("without","with") kudzu.df Notice that the names have now been changed. See page 384 of the textbook for a description of this dataset.

  • Two Sample t-Tests for Independent Samples : The built-in function called 't.test' can also perform calculations for hypothesis tests for the difference in population means.

By default, the 't.test' function provides information about a two-sided hypothesis test for the situation where the null difference in means is zero. In order to change it you just have to specify a few more options in the command. For example, assuming the population variances are unequal, to perform a one-sided hypothesis test to determine if the difference between mean pulp yield that has been treated and mean pulp yield that has not been treated is less than 5 (this implies the null hypothesis is that the difference in means is greater than or equal to 5):

t.test(x=kudzu.df$with,y=kudzu.df$without, alt="less",mu=5,var.equal=FALSE) Notice the ordering of the samples in the R command, this is important, if the order was switched we would have to use 'mu=-5' and 'alt="greater"'. If we are using a 0.05 level of significance, would we have sufficient evidence to reject the null hypothesis given this information? What if you wanted to test whether the population means are significantly different? Use the following command: t.test(x=kudzu.df$with,y=kudzu.df$without, var.equal=FALSE) Notice that here the ordering of samples could be switched with no effect on the results because the test is two-sided with null difference equal to zero. Now would you reject the null hypothesis? Does the two-sided confidence interval for difference in population means support this decision? Does assuming the population variances are equal change the p-value?

  • Paired Test for Two Population Means : When you have paired data you only have to change one option in the 't.test' function in R. For example, download the dataset called 'golfball.csv' and go through the usual steps to load it into R (change directory to where you put it, etc…):

    golf.df=read.csv("golfball.csv") names(golf.df)=c("golfer","old","new") golf.df

Lab Assignment 3 Instructions: When performing hypothesis tests for the labs, you should write out the null and alternative hypotheses, whether you reject the null or not based on the results, and a conclusion for the test (like we did in class). 1.) For a set of 20 trucks, two different types of tires (standard and new) were placed randomly on either the right or left front wheels. The tire manufacturer would like to determine if the new tires wear more slowly than the standard tires. After a set amount of drive time over similar road conditions, the reductions in tread depths of the tires are measured. These data are on the course website in the file called 'tires.csv'. a. Perform the appropriate statistical test to address the manufacturer's question. Be sure to fully document the type of test(s) you perform, hypotheses, significance levels, and be sure to summarize your findings. Be sure to discuss any assumptions you make. 2.) The viscosity of oil after it has been used in an engine over a period of time may change from its initial value because the high temperature inside the engine can cause the oil to break down. An experiment was conducted to compare the effect of oil viscosity of two different engines. Various samples of the same type of oil with a constant viscosity were used, some in engine 1 and some in engine 2, and the engines were run under identical operating conditions. The resulting values of the oil viscosities after having been used in the engines are given in the dataset called 'oil.csv' on the course website. a. Is there reason to believe that the true variability of oil viscosity is different after being run in the different engines? b. Is there any evidence that the engines have different effects on oil viscosity? Be sure to document any and all statistical methods used to address these questions (e.g., type of test(s) you perform, specific hypotheses, significance levels, assumptions made, and summary of findings).