Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Estimating Means with Confidence: Constructing Confidence Intervals for Differences, Lecture notes of Art

Lancaster University Art

An in-depth explanation of how to construct confidence intervals for single means and paired differences using R. It covers the concept of t-multiplier, the use of the qt() function, and the validation of conditions for confidence intervals. The document also includes examples of calculating confidence intervals for hours of sleep and computer-TV usage for different student groups.

Typology: Lecture notes

2021/2022

Uploaded on 09/12/2022

bartolix 🇬🇧

4.8

(17)

301 documents

1 / 9

This page cannot be seen from the preview

Don't miss anything!

Chapter 11

Estimating Means with Confidence

1. Determining the t-multiplier for a confidence interval

In Example 11.4 we need to find the multiplier t* values for 24 degrees of freedom and 95% or 99%

confidence. This is almost identical to what was done in Chapter 10 to find the multiplier z*, but we only

need to specify the degrees of freedom and not the mean nor standard deviation. The R function qt( p,

df ) is used where p is the percentile and df is the degrees of freedom. So for the 95% confidence,

the multiplier t* is found by typing the command qt( 0.975, 24 ) and for the 99% confidence,

qt( .995, 24 ). The commands and R output are shown below. Recall that for the 95% confidence

multiplier, we need that value of t* such that 2.5% of the area beneath the t-distribution density curve is to

the right of t* or equivalently 97.5% is to the left.

> qt( 0.975, 24 )

[1] 2.063899

> qt( 0.995, 24 )

[1] 2.796940

2. Constructing a confidence interval for a single mean

Example 11.5 of Section 11.2 constructs the 95% confidence interval for the mean forearm length of men

using a random sample of n=9 men. We will first show a long way to calculate the confidence interval by

explicitly utilizing the formula

. Next we will do the equivalent with the R function

t.test(). Either way, we will first have to enter the 9 forearm lengths into a variable which we will

call x. The R commands and output follow showing the interval to be 24.33 to 26.67 inches. (Note:

Multiple commands can be used in the same line of commands as done below with mean(), qt(), and

sd().)

> x <- c(25.5, 24, 26.5, 25.5, 28, 27, 23, 25, 25)

> mean(x) + qt( .975, 8) * sd(x)/sqrt(9)

[1] 26.66891

> mean(x) - qt( .975, 8) * sd(x)/sqrt(9)

[1] 24.33109

>

> t.test( x, conf.level=0.95 )

One Sample t-test

data: x

t = 50.3061, df = 8, p-value = 2.7e-11

alternative hypothesis: true mean is not equal to 0

95 percent confidence interval:

24.33109 26.66891

sample estimates:

mean of x

25.5

Discover Lecture notes of Art Lancaster University

Partial preview of the text

Download Estimating Means with Confidence: Constructing Confidence Intervals for Differences and more Lecture notes Art in PDF only on Docsity!

Chapter 11 Estimating Means with Confidence

1. Determining the t-multiplier for a confidence interval

In Example 11.4 we need to find the multiplier t* values for 24 degrees of freedom and 95% or 99% confidence. This is almost identical to what was done in Chapter 10 to find the multiplier z* , but we only need to specify the degrees of freedom and not the mean nor standard deviation. The R function qt( p, df ) is used where p is the percentile and df is the degrees of freedom. So for the 95% confidence, the multiplier t* is found by typing the command qt( 0.975, 24 ) and for the 99% confidence, qt( .995, 24 ). The commands and R output are shown below. Recall that for the 95% confidence multiplier, we need that value of t* such that 2.5% of the area beneath the t-distribution density curve is to the right of t* or equivalently 97.5% is to the left.

qt( 0.975, 24 ) [1] 2. qt( 0.995, 24 ) [1] 2.

2. Constructing a confidence interval for a single mean

Example 11.5 of Section 11.2 constructs the 95% confidence interval for the mean forearm length of men using a random sample of n =9 men. We will first show a long way to calculate the confidence interval by explicitly utilizing the formula. Next we will do the equivalent with the R function

t.test(). Either way, we will first have to enter the 9 forearm lengths into a variable which we will call x. The R commands and output follow showing the interval to be 24.33 to 26.67 inches. (Note: Multiple commands can be used in the same line of commands as done below with mean(), qt(), and sd().)

x <- c(25.5, 24, 26.5, 25.5, 28, 27, 23, 25, 25) mean(x) + qt( .975, 8) * sd(x)/sqrt(9) [1] 26. mean(x) - qt( .975, 8) * sd(x)/sqrt(9) [1] 24.

t.test( x, conf.level=0.95 )

One Sample t-test data: x t = 50.3061, df = 8, p-value = 2.7e- alternative hypothesis: true mean is not equal to 0 95 percent confidence interval: 24.33109 26. sample estimates: mean of x

3. Checking the conditions before finding a confidence interval for the mean

Example 11.7 of Section 11.2 explores whether liberal art majors or more technical majors sleep more. The liberal art majors are from an introductory statistics course (Stat10, n=25) while the more technical majors are from a different larger introductory statistics course (Stat13, n=148). The data are found in UCDavis1.RData.

We want to calculate confidence intervals for the mean hours of sleep for each group. For a t-confidence interval to be valid either the population needs to be normally distributed or the sample size needs to be large. The Stat13 class is large so issues of normality are not of concern. There are only 25 Stat students, however, so we should check that the distribution of the sample is reasonably close to a normal distribution. This will involve quickly graphing the data to check for any extreme outliers or skewness. We will use boxplots and a histogram to do this.

First, we need to import the data into R and create vectors for the number of hours slept by the Stat10 and Stat13 students. (Do not forget to attach the data frame as shown in the below R commands. ) The column “class” lists Stat10 students as “L” and Stat13 students as “N”. We will extract the number of hours of sleep (“Sleep”) for each group into stat10 and stat13 vectors using the logical R syntax == which is the equivalent to the question “is equal to?”. For example, the command stat10 <- Sleep[Class=="L"] will assign to the vector stat10 those values from the Sleep column where the corresponding class value is equal to L. The functions length(), mean(), and sd() will be used to get quick numerical descriptive statistics of each group. The R commands to be typed and the R output follow.

load(“C:/RData/UCDavis1.RData”) ucdavis1 <- edit(ucdavis1) names( ucdavis1 ) [1] "Sex" "TV" "computer" "Sleep" "s" "a" "h" "m" "d" [10] "e" "g" "Class" attach( ucdavis1 ) stat10 <- Sleep[Class=="L"] stat13 <- Sleep[Class=="N"]

length( stat10 ) [1] 25 mean( stat10 ) [1] 7. sd( stat10 ) [1] 1. length( stat13 ) [1] 148 mean( stat13 ) [1] 6. sd( stat13 ) [1] 1.

We next do the commands

boxplot( stat10, stat13, names=c("Stat10", "Stat13") )

and

4. Computing a confidence interval for a single mean

In the above numerical descriptive statistics we saw that the Stat10 students averaged 7.7 hours of sleep compared to 6.8 hours for the Stat13 students. Let us treat these as samples from a larger population and find a confidence interval for the different population means.

Assuming conditions are valid for the confidence intervals, the next step is easy. We simply use the t.test() function. The default is for the 95% confidence intervals, so the “conf.level=0.95” option is actually not necessary. The R command lines and output that follow give us 7.11 to 8.21 hours for the Stat10 students and 6.53 to 7.09 hours for the Stat13 students. There is no overlap suggesting that a population of Stat10 students would have a greater mean number hours of sleep than a population of Stat13 students.

t.test( stat10, conf.level=0.95 )

One Sample t-test

data: stat t = 28.4944, df = 24, p-value < 2.2e- alternative hypothesis: true mean is not equal to 0 95 percent confidence interval: 7.105173 8.

sample estimates: mean of x

t.test( stat13 )

One Sample t-test data: stat t = 47.8299, df = 147, p-value < 2.2e- alternative hypothesis: true mean is not equal to 0 95 percent confidence interval: 6.531022 7. sample estimates: mean of x

5. Finding paired differences from raw data

As done with hours of sleep, we need to extract the computer and TV times for only the Stat10 students into stat10computer and stat10TV vectors. The R commands follow and the 25 differences. The students spent 5.36 hours a week more behind computers than watching TV.

stat10computer <- computer[Class=="L"] stat10TV <- TV[Class=="L"] difference <- stat10computer - stat10TV difference [1] 28.0 18.5 -4.0 8.0 4.0 -20.0 21.0 19.0 -12.0 -5.0 -5. [12] 2.0 40.0 -1.0 -12.0 10.0 5.0 0.0 35.0 -10.5 -14.0 1. [23] 5.0 14.0 7. mean(difference) [1] 5.

6. Checking the conditions before finding a confidence interval for paired data

Creating a confidence interval for the mean of paired differences is essentially the same as creating a confidence interval for a single mean. Thus with only 25 differences, we should check the data to see if we can reasonably assume the population is normally distributed. Like before, we will do a boxplot and histogram of the differences. The R commands and graphs follow.

boxplot( difference, main="Computer-TV") hist( difference )

7. Calculating a confidence interval for paired data

There are several ways of calculating the confidence interval for paired data. We can treat the differences as a single random variable and simply calculate the confidence interval for a single mean. This would be just like section 2 of this chapter’s lab manual and we use the vector difference. The R commands for two different methods are given below, but no output.

mean(difference) + qt(0.95, 24)sd(difference)/sqrt(25) mean(difference) - qt(0.95, 24)sd(difference)/sqrt(25) t.test( difference, conf.level=0.9 )

The t.test() function has a special option to work with paired data. Using the paired=T option allows you to skip the calculation of difference and instead simply provide the two vectors of paired data. Below is the R command and the output. The 90% confidence interval is for between 0.14 to 10.58 more hours of computer than TV a week.

t.test( stat10computer, stat10TV , paired=T, conf.level=0.9 )

Paired t-test data: stat10computer and stat10TV t = 1.7582, df = 24, p-value = 0. alternative hypothesis:true difference in means is not equal to 0 90 percent confidence interval: 0.1442597 10. sample estimates: mean of the differences

Example 11.14, Section 11.4: General confidence interval for the difference between two

means (independent samples)

We will continue to use the UCDavis1.RData dataset for this example. The number of hours slept by males and females will be compared. The sample will consist of only the non liberal art students; i.e., Stat13 students. Males and females are to be considered two independent samples.

The data will be extracted to vectors using the logical == and the logical & which is equivalent to “and”. For example, the R commands sleepmale <- Sleep[ Class=="N" & Sex=="M"] will extract only those values from Sleep where Class is equal to N and Sex is M. To get vectors of male and female hours of sleep, to see their respective mean number hours of sleep, and to count the number of males and females type the following R code.

sleepmale <- Sleep[ Class=="N" & Sex=="M"] sleepfemale <- Sleep[ Class=="N" & Sex=="F"] mean( sleepmale ) [1] 6. mean( sleepfemale ) [1] 7. length(sleepmale) [1] 65 length(sleepfemale) [1] 83

1. Checking conditions before computing a confidence interval for the difference between two independent means.

The distributions of male and female sleep times need to be normal for the t-confidence interval to be valid unless the sample sizes are large. The sample sizes are 65 and 83, so issues of normality are not critical for this example. Nevertheless, we will do a quick boxplot inspection to convince ourselves that the data are not terribly skewed or have extreme outliers. The R command and graph follow which assures us that the data are fairly symmetric and without any extreme outliers.

boxplot( sleepmale, sleepfemale, names=c("male","female"),

main="Hours of Sleep")

2. Calculating the confidence interval for the difference between two independent means – pooled and unpooled versions.

The pooled version of the confidence interval for the difference between two independent mean assumes that the two populations share a common (same) variance. The sample variances for the males and females ( var(sleepmale); var(sleepfemale)) are 2.83 and 3.08. The difference may simply be a consequence of sampling variation. Assuming the population variances are the same, it is sensible to combine (pool) the sample variances of both groups into a single variance estimate of the common variance. The degrees of freedom will be n 1 + n 2 − 2 = 65 + 83 − 2 = 146. The R command is simple,

Estimating Means with Confidence: Constructing Confidence Intervals for Differences, Lecture notes of Art

Related documents

Partial preview of the text

Download Estimating Means with Confidence: Constructing Confidence Intervals for Differences and more Lecture notes Art in PDF only on Docsity!

Chapter 11