Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
Instructions and examples for calculating confidence intervals and performing hypothesis tests for proportions using r and the prop.test function. The examples cover calculating confidence intervals for proportions, testing hypotheses about proportions, and comparing two proportions. The document also discusses one-sided and two-sided p-values and the agresti-coull ‘plus four’ interval.
Typology: Exercises
1 / 6
University of North Carolina Chapel Hill
Professor François Nielsen
This handout covers computer issues related to Chapters 18, 19, 20, 21 and 22 in De Veaux et al. 2012. Stats: Data and Models. 3e. Addison-Wesley. (STATDM3)
See Computer Handout for Homework 3 and Activity 9 for discussion on how to simulate sam- pling distributions using R.
Calculating a CI for a Proportion “by hand” I illustrate calculating a confidence interval for a proportion with the example of 510 randomly sampled adults in October 2008 responding to the question “Generally speaking, do you believe the death penalty is applied fairly or unfairly in this country today?”, in which 275 (54%) answered “Fairly” (STATDM3 pp.464–466). Using R as a calculator, one would proceed as follows.
n <- 510 phat <- 275/ SE <- sqrt(phat(1 - phat)/n) SE [1] 0. alpha <-. zstar <- qnorm(1 - alpha/2) # z for p =. zstar [1] 1. ME <- zstarSE c(phat - ME, phat + ME) [1] 0.4959550 0.
Thus we are 95% confident that between 49.6% and 58.2% of adults think that the death penalty is applied fairly.
Calculating a CI for a Proportion with prop.test The R function prop.test calculates the CI for a proportion. It is used as prop.test(x, n, conf.level = 0.95, correct = TRUE) where x is the number of successes, n is the sample size, conf.level is the desired confidence
level (95% by default) and correct indicates whether a continuity correction is used. For the death penalty example, prop.test is used as follows, specifying correct = FALSE.
prop.test(275, 510, correct=FALSE)
1-sample proportions test without continuity correction
data: 275 out of 510, null probability 0. X-squared = 3.1373, df = 1, p-value = 0. alternative hypothesis: true p is not equal to 0. 95 percent confidence interval: 0.4958229 0. sample estimates: p
We see that this confidence interval is very close to the one calculated “by hand”. Note that I am using the option correct = FALSE here only to make the results most comparable to those in the text. In general, however, the continuity correction does no harm and we would leave the default option correct = TRUE as is.
CI for a Proportion for a Factor in a Dataframe In practice we often want to calculate a CI for a proportion from the original, ungrouped data stored as a factor in a data frame. To illustrate I calculate a CI for the proportion of normal (as opposed to depressed) respondents in Afifi and Clark’s depress data set. The (confusingly named) variable cases is a factor taking the value depressed if the respondent has cesd >= 16 and normal otherwise. I use prop.test after first tabulating the values of cases with the table function.
reading the Afifi and Clark data
library(foreign) depress <- read.dta("depress.dta") # read Stata data set attach(depress) # to make variable names accessible head(cases, 10) # look at first 10 observations [1] normal normal normal normal normal normal normal [8] normal depressed normal Levels: normal depressed tab <- table(cases) tab cases normal depressed 244 50 prop.test(tab)
1-sample proportions test with continuity correction
data: tab, null probability 0. X-squared = 126.6973, df = 1, p-value < 2.2e- alternative hypothesis: true p is not equal to 0. 95 percent confidence interval: 0.7809537 0. sample estimates: p
We can be 95% confident that the proportion of non-depressed respondents in the population sampled is between 78.1% and 87.0%. Note that I have spelled out the steps in detail. Once we understand what is going on we can just enter prop.test(table(cases)) to obtain our CI in one fell swoop. Note also that the hypothesis-testing part of the output can be ignored as it tests the default hypothesis p = .5, which is not meaningful here.
Hypothesis Test for One Proportion “by hand” To illustrate a hypothesis test for one propor- tion I use the example of the “home field advantage” hypothesis in the 2009 Major League Base- ball season (STATDM3, pp.485–486), in which the home team won 1333 (54.8%) of the 2430 games. Could this deviation from 50% be due to chance or is there really a home field advan- tage in professional baseball? We set up the hypothesis to be tested as H 0 : p = .50; HA : p >. and proceed as follows.
p0 <-. n <- 2430 phat <- 1333/ phat [1] 0. SD <- sqrt(p0*(1-p0)/n) # note this is SD, not SE; why? z <- (phat - p0)/SD # test statistic z [1] 4. 1 - pnorm(z) [1] 8.44355e-
The very small p-value indicates that the 54.86% proportion of wins by the home team is un- likely to obtain by chance if the probability of winning is .5. Thus we reject the hypothesis that the home team has no advantage.
Hypothesis Test for One Proportion with prop.test R function prop.test can test hypothe- ses as well as calculate confidence intervals. To test the hypothesis that the proportion of wins by the home team is greater than .5 we proceed as follows.
prop.test(x=1333, n=2430, p=.5, alternative="greater", correct=FALSE)
1-sample proportions test without continuity correction
data: 1333 out of 2430, null probability 0. X-squared = 22.9202, df = 1, p-value = 8.444e- alternative hypothesis: true p is greater than 0. 95 percent confidence interval: 0.5319099 1. sample estimates: p
I specified alternative=”greater” because the hypothesis is one-sided in the direction p > p 0 = .5; the two-sided hypothesis or the one-sided hypothesis is the other direction would be in- dicated by alternative=”two.sided” (default) and alternative=”less”, respectively. Be- cause the alternative hypothesis is one-sided (”greater”), the confidence interval (.532, 1.000) provided by prop.test is also one-sided. However, we will not consider one-sided confidence intervals further in this course.
One-sided and Two-sided p-Values The prop.test function in R provides the correct p- value according to whether the test is one-sided (alternative = ”greater” or alternative = ”less”), or two-sided (alternative = ”two.sided”). For a given p 0 the p-value of the two-sided test is twice that of the corresponding one-sided test. For example, for the home team advantage example, the two.sided test that the probability of home team win is actually .5 is as follows.
prop.test(x=1333, n=2430, p=.5, alternative="two.sided", correct=FALSE)
1-sample proportions test without continuity correction
data: 1333 out of 2430, null probability 0. X-squared = 22.9202, df = 1, p-value = 1.689e- alternative hypothesis: true p is not equal to 0. 95 percent confidence interval: 0.5287125 0. sample estimates: p
We see that the p-value 1.689e-06 of the two-sided test is twice the p-value 8.444e-07 found above for the one-sided test.
The Agresti-Coull “Plus Four” Interval When the sample has fewer than 10 successes or failures, the Agresti-Coull “Plus Four” interval can be calculated by adding 2 successes and 2 failures (thus 4 cases to the total count) and calculating the confidence interval with prop.test. The example of the 45 surgical operations with 3 failures (STATDM3, p.511) does not satisfy the Success/Failure Condition. Thus we calculate the Agresti-Coull interval as follows.
prop.test(x = 3+2, n = 45+4, correct=FALSE)
1-sample proportions test without continuity correction
data: 3 + 2 out of 45 + 4, null probability 0. X-squared = 31.0408, df = 1, p-value = 2.527e- alternative hypothesis: true p is not equal to 0. 95 percent confidence interval: 0.04437955 0. sample estimates: p
The confidence interval (0.04437955, 0.21756362) reported by prop.test differs from the interval (.017, .187) reported in the text (p.511). I do not know why at the moment.
Comparing Two Proportions “by hand” To illustrate comparison of two proportions I use the example of seat-belt use by male drivers depending on whether a woman is sitting next to them. In these data, of 4208 male drivers with female passengers 2777 (66.0%) used their seat-belt. Among 2763 male drivers with male passengers only, 1383 (49.3%) wore seat belts (STATDM3, p.525). Using R as a calculator, one could proceed as follows (STATDM3, pp.529–531).
nf <- 4208 nm <- 2763 phatf <- 2777/ phatm <- 1363/ SE <- sqrt(phatf(1-phatf)/nf + phatm(1-phatm)/nm) SE [1] 0. zstar <- qnorm(.975) zstar [1] 1. ME <- zstar*SE dif <- phatf - phatm dif [1] 0. c(dif - ME, dif + ME) # CI for difference [1] 0.1431261 0.
This corresponds closely to the text result.
Comparing Two Proportions with prop.test To compare the two proportions with prop.test we need to create vectors with the numbers of successes and sample sizes, respectively. These vectors serve as input to prop.test.
y <- c(2777, 1363) # the 2 numbers of successes n <- c(4208, 2763) # the 2 sample sizes prop.test(y, n, correct=FALSE)
2-sample test for equality of proportions without continuity correction
data: y out of n X-squared = 192.0052, df = 1, p-value < 2.2e- alternative hypothesis: two.sided 95 percent confidence interval: 0.1431261 0. sample estimates: prop 1 prop 2 0.6599335 0.
The result is identical to that produced by the “by hand” method.
Comparing Two Proportions in a Dataframe Are men more “normal” than women? This provocative conjecture can be investigated by comparing the proportions of men and women who are diagnosed as normal (as opposed to depressed) on the basis of their cesd score in the Afifi and Clark depress data. We do this by constructing a table of factor cases (with categories normal and depressed) with factor sex (with categories male and female), and inputting the table into prop.test, as follows.
table(sex, cases) cases sex normal depressed male 101 10 female 143 40
prop.test(table(sex, cases))
2-sample test for equality of proportions with continuity correction
data: table(sex, cases) X-squared = 7.1968, df = 1, p-value = 0. alternative hypothesis: two.sided 95 percent confidence interval: 0.04111289 0. sample estimates: prop 1 prop 2 0.9099099 0.
detach(depress) # cleanup
We see that the proportion “normal” differs significantly at the p < .01 level between men and women (p-value = 0.007303). The estimated difference in proportions is 12.8%. We can be 95% confident that the difference between the sexes is between 4.1% and 21.6%. Note that it is important for interpretation to enter the explanatory variable first in the table function (i.e., table(sex, cases) rather than table(cases, sex)), so prop.test returns the conditional proportions of cases given sex, rather than the other way around. The p-value of the test, however, would be the same if we had entered cases first.