Download Sociological Research: Confidence Intervals & Hypothesis Tests for Proportions and more Exercises Advanced Data Analysis in PDF only on Docsity! University of North Carolina Chapel Hill Soci252-003 Data Analysis in Sociological Research Spring 2012 Professor François Nielsen Homework 4 – Computer Handout Readings This handout covers computer issues related to Chapters 18, 19, 20, 21 and 22 in De Veaux et al. 2012. Stats: Data and Models. 3e. Addison-Wesley. (STATDM3) Chapter 18 – Sampling Distribution Models See Computer Handout for Homework 3 and Activity 9 for discussion on how to simulate sam- pling distributions using R. Chapter 19 – Confidence Intervals for Proportions Calculating a CI for a Proportion “by hand” I illustrate calculating a confidence interval for a proportion with the example of 510 randomly sampled adults in October 2008 responding to the question “Generally speaking, do you believe the death penalty is applied fairly or unfairly in this country today?”, in which 275 (54%) answered “Fairly” (STATDM3 pp.464–466). Using R as a calculator, one would proceed as follows. > n <- 510 > phat <- 275/510 > SE <- sqrt(phat*(1 - phat)/n) > SE [1] 0.02207217 > alpha <- .05 > zstar <- qnorm(1 - alpha/2) # z for p = .975 > zstar [1] 1.959964 > ME <- zstar*SE > c(phat - ME, phat + ME) [1] 0.4959550 0.5824763 Thus we are 95% confident that between 49.6% and 58.2% of adults think that the death penalty is applied fairly. Calculating a CI for a Proportion with prop.test The R function prop.test calculates the CI for a proportion. It is used as prop.test(x, n, conf.level = 0.95, correct = TRUE) where x is the number of successes, n is the sample size, conf.level is the desired confidence 1 S O C I 2 5 2 - 0 0 3 – D A T A A N A L Y S I S I N S O C I O L O G I C A L R E S E A R C H 2 level (95% by default) and correct indicates whether a continuity correction is used. For the death penalty example, prop.test is used as follows, specifying correct = FALSE. > prop.test(275, 510, correct=FALSE) 1-sample proportions test without continuity correction data: 275 out of 510, null probability 0.5 X-squared = 3.1373, df = 1, p-value = 0.07652 alternative hypothesis: true p is not equal to 0.5 95 percent confidence interval: 0.4958229 0.5820222 sample estimates: p 0.5392157 We see that this confidence interval is very close to the one calculated “by hand”. Note that I am using the option correct = FALSE here only to make the results most comparable to those in the text. In general, however, the continuity correction does no harm and we would leave the default option correct = TRUE as is. CI for a Proportion for a Factor in a Dataframe In practice we often want to calculate a CI for a proportion from the original, ungrouped data stored as a factor in a data frame. To illustrate I calculate a CI for the proportion of normal (as opposed to depressed) respondents in Afifi and Clark’s depress data set. The (confusingly named) variable cases is a factor taking the value depressed if the respondent has cesd >= 16 and normal otherwise. I use prop.test after first tabulating the values of cases with the table function. > # reading the Afifi and Clark data > library(foreign) > depress <- read.dta("depress.dta") # read Stata data set > attach(depress) # to make variable names accessible > head(cases, 10) # look at first 10 observations [1] normal normal normal normal normal normal normal [8] normal depressed normal Levels: normal depressed > tab <- table(cases) > tab cases normal depressed 244 50 > prop.test(tab) 1-sample proportions test with continuity correction data: tab, null probability 0.5 X-squared = 126.6973, df = 1, p-value < 2.2e-16 alternative hypothesis: true p is not equal to 0.5 95 percent confidence interval: 0.7809537 0.8700663 sample estimates: p 0.829932 S O C I 2 5 2 - 0 0 3 – D A T A A N A L Y S I S I N S O C I O L O G I C A L R E S E A R C H 5 > nf <- 4208 > nm <- 2763 > phatf <- 2777/4208 > phatm <- 1363/2763 > SE <- sqrt(phatf*(1-phatf)/nf + phatm*(1-phatm)/nm) > SE [1] 0.01199155 > zstar <- qnorm(.975) > zstar [1] 1.959964 > ME <- zstar*SE > dif <- phatf - phatm > dif [1] 0.1666291 > c(dif - ME, dif + ME) # CI for difference [1] 0.1431261 0.1901321 This corresponds closely to the text result. Comparing Two Proportions with prop.test To compare the two proportions with prop.test we need to create vectors with the numbers of successes and sample sizes, respectively. These vectors serve as input to prop.test. > y <- c(2777, 1363) # the 2 numbers of successes > n <- c(4208, 2763) # the 2 sample sizes > prop.test(y, n, correct=FALSE) 2-sample test for equality of proportions without continuity correction data: y out of n X-squared = 192.0052, df = 1, p-value < 2.2e-16 alternative hypothesis: two.sided 95 percent confidence interval: 0.1431261 0.1901321 sample estimates: prop 1 prop 2 0.6599335 0.4933044 The result is identical to that produced by the “by hand” method. Comparing Two Proportions in a Dataframe Are men more “normal” than women? This provocative conjecture can be investigated by comparing the proportions of men and women who are diagnosed as normal (as opposed to depressed) on the basis of their cesd score in the Afifi and Clark depress data. We do this by constructing a table of factor cases (with categories normal and depressed) with factor sex (with categories male and female), and inputting the table into prop.test, as follows. > table(sex, cases) cases sex normal depressed male 101 10 female 143 40 S O C I 2 5 2 - 0 0 3 – D A T A A N A L Y S I S I N S O C I O L O G I C A L R E S E A R C H 6 > prop.test(table(sex, cases)) 2-sample test for equality of proportions with continuity correction data: table(sex, cases) X-squared = 7.1968, df = 1, p-value = 0.007303 alternative hypothesis: two.sided 95 percent confidence interval: 0.04111289 0.21586540 sample estimates: prop 1 prop 2 0.9099099 0.7814208 > detach(depress) # cleanup We see that the proportion “normal” differs significantly at the p < .01 level between men and women (p-value = 0.007303). The estimated difference in proportions is 12.8%. We can be 95% confident that the difference between the sexes is between 4.1% and 21.6%. Note that it is important for interpretation to enter the explanatory variable first in the table function (i.e., table(sex, cases) rather than table(cases, sex)), so prop.test returns the conditional proportions of cases given sex, rather than the other way around. The p-value of the test, however, would be the same if we had entered cases first.