Download Statistical Inference: Confidence Intervals and Hypothesis Tests - Prof. Nancy M. Pfenning and more Study notes Data Analysis & Statistical Methods in PDF only on Docsity! Lecture 20 Nancy Pfenning Stats 1000 Standardized Statistics Recall: If the underlying population variable X is normal with mean µ, standard deviation σ, then for a random sample of size n, the random variable X̄ is normal with mean µ, standard deviation σ√ n . We used this fact to transform X̄ to a standard normal random variable Z, and solved for probabilities with normal tables: Z = X̄−µ σ/ √ n is normal with mean 0, standard deviation 1. [Note that the spread of Z is always 1, regardless of sample size n.] In situations involving a large sample size n, sample standard deviation s is approximately equal to σ, and we can treat X̄−µ s/ √ n as (approximately) a standard normal variable Z. If sample size n is small, s may be quite different from σ, and the random variable which we call t = X̄−µ s/ √ n does not follow a standard normal distribution. Because of subtracting the expected value of X̄ (that is, µ) from X̄ in the numerator, the distribution of t = X̄−µ s/ √ n is (like Z) centered at zero and symmetric. Because of dividing by s/ √ n (which is not the standard deviation of X̄), the standard deviation of t is not fixed at 1 as it is for Z. Sample standard deviation s contains less information than σ, so the spread of t is greater than that of Z, especially for small sample sizes n. Since s approaches σ as sample size n increases, the t distribution approaches the standard normal Z distribution as n increases. Thus, the spread of sample mean standardized using s instead of σ depends on the sample size n. We say the distribution has n− 1 “degrees of freedom”, abbreviated df . Since there are many different t distributions—one for each df—it would take too much space to provide tables for each of them in as much detail as was provided for the standard normal z in Table A.1. Instead, t tables are condensed to provide minimal adequate information needed to state useful results. Statistical Inference Statistical inference is the process of inferring something about a larger group (the population) by analyzing data for a part of that group (the sample). There are two general forms of statements we make using statistical inference: (1) confidence intervals; and (2) significance tests. We use these forms of inference in order to answer questions about (a) population proportion p [for categorical data] or (b) the population mean µ [for quantitative data]. [In addition, we can use significance tests to answer questions about relationships between two variables, such as the chi-square test of a relationship between two categorical variables. The chi-square statistic chi-square = sum of all (observed−expected) 2 expected is another standardized statistic that follows a known pattern with values and probabilities that can be summarized in a table.] 1. Confidence Interval Questions (a) (for p) In May, 2000, .56 of 1,012 respondents to an Associated Press survey supported gays’ rights to inherit from their partners. What interval should contain the proportion of all Americans who support gays’ rights to inherit? How confident can we be that this interval contains the true proportion p? (b) (for µ) A random sample of 25 laboratory mice from a large colony was found to have mean weight 33 grams and standard deviation 5 grams. Within what interval does mean weight for all colony mice lie? How confident can we be about the correctness of this interval? 2. Significance Test Questions (a) (for p) In May, 2000, .56 of 1,012 respondents to an Associated Press survey supported gays’ rights to inherit from their partners. Can we conclude that a majority of the population support gays’ rights to inherit? 79 (b) (for µ) Researchers are going under the assumption that their lab mice weigh an average of 30 grams, but an assistant feels they actually weigh more. She takes an SRS of 25 mice and finds their mean weight to be 33 grams. [Somehow it is known that weights of all mice in the lab vary normally with standard deviation 5 grams.] If the mean weight were really only 30 grams, how unlikely would it be to get a sample of 25 whose mean weight is as high as 33 grams? The laws of probability will enable us to answer such questions with precision. But these laws are inapplicable and useless if our data have not been produced correctly. [For example, maybe the lab assistant’s selection was biased towards slower, heavier mice, or maybe it was biased towards smaller, cuter mice.] The sample must be chosen at random in such a way that it serves as an adequate representative of the entire population. The reliability of our conclusions still depends on conscientious adherence to the basic principles of statistical design presented in Chapters 3 and 4. Chapter 10: Estimating Proportions With Confidence Probability vs. Confidence Recall: our Rules for Sample Proportions stated that if numerous samples or repetitions of the same size are taken, sample proportion p̂ has mean p, the true proportion for the population, standard deviation √ p(1−p) n , and a shape that is approximately normal as long as np ≥ 10 and n(1 − p) ≥ 10. Because of approximate normality, we can invoke the Empirical Rule: it tells us that the approximate probability is .68 that p̂ falls within √ p(1−p) n of p; .95 that p̂ falls within 2 √ p(1−p) n of p; .997 that p̂ falls within 3 √ p(1−p) n of p. If p̂ falls within √ p(1−p) n of p, then p must fall within √ p(1−p) n of p̂! Similarly, if p̂ falls within 2 √ p(1−p) n of p, then p must fall within 2 √ p(1−p) n of p̂, etc. But p is not a random variable like p̂: its value is not a “numerical outcome of a random phenomenon”, but fixed and unchanging (even if we don’t happen to know what it is). Thus, we cannot talk about the “probability” of p lying in a certain interval. Instead, if we take a sample of size n from a population and record the sample proportion p̂ in the category of interest, we can be approximately “68% confident” that the interval p̂ ± √ p(1−p) n contains the unknown population proportion p. Notice that the standard deviation of p̂ is √ p(1−p) n . Since p is unknown, this standard deviation cannot be known either, so we estimate it by substituting p̂ for p: the standard error of p̂ is s.e.(p̂) = √ p̂(1 − p̂) n [In general, standard error is calculated from the sample as an estimate for population standard deviation.] Now, combining the Empirical Rule with the language of confidence and the standard error approxima- tion, we say we are approximately 68% confident that p is in the interval p̂ ± √ p̂(1−p̂) n ; 95% confident that p is in the interval p̂ ± 2 √ p̂(1−p̂) n ; 99.7% confident that p is in the interval p̂ ± 3 √ p̂(1−p̂) n . The 95% confidence interval is by far the one most commonly seen. When news reports refer to the “margin of error”, they mean the give-or-take around the estimate that results in an interval that captures the unknown parameter with a 95% success rate in the long run, namely 2 √ p̂(1−p̂) n . 80 In general, a level C confidence interval for any parameter is an interval computed from sample data by a method that has probability C of producing an interval that contains the true value of the parameter. For now (Chapter 10), the parameter of interest is p; in Chapter 12 it will be µ or other parameters involving population mean. We want to say our confidence level is C that the actual proportion p lies in a certain interval, in other words that p lies within a certain distance of p̂, in other words that p lies in the interval estimate ± margin of error where the estimate is p̂, and the margin of error depends on confidence C. C equals a probability, associated with a standard normal value z∗. First note that if C is the area under the standard normal curve between −z∗ and +z∗, then the regions to the left of −z∗ and to the right of +z∗ each have area 1−C2 . We call z∗ with probability 1−C2 lying to the right under the standard normal curve the multiplier that accompanies the confidence level C. The “infinite” row of Table A.2 provides z∗ values for the four most common confidence levels C: 1. .90 is the confidence level C for z∗ = 1.645 2. .95 is the confidence level C for z∗ = 1.960 3. .98 is the confidence level C for z∗ = 2.326 4. .99 is the confidence level C for z∗ = 2.576 For a given C, the approximate margin of error is z∗ √ p̂(1−p̂) n . Conditions: The interval p̂±z∗ √ p̂(1−p̂) n is approximately correct as long as the population is at least ten times the sample, and np̂ and n(1−p̂) are both at least 10. The former guarantees approximate independence of selections; if they were dependent, the standard deviation would change. The latter simply requires a check that there have been at least ten each of “successes” and “failures” observed. In general 1. A 90% confidence interval for p is p̂ ± 1.645 ∗ s.e.(p̂) 2. A 95% confidence interval for p is p̂ ± 1.960 ∗ s.e.(p̂) 3. A 98% confidence interval for p is p̂ ± 2.326 ∗ s.e.(p̂) 4. A 99% confidence interval for p is p̂ ± 2.576 ∗ s.e.(p̂) Example An article reported that in a random sample of 244 doctors, 184 said they would object to the sale of human organs for transplants. Obtain a 90% confidence interval for the proportion p of all doctors objecting to such sales. First we find p̂ = 184244 = .754. For C=.90, z ∗ = 1.645. s.e.(p̂) = √ .754(.246) 244 = .0276. A 90% confidence interval for p is .754 ± 1.645(.0276) = .754 ± .045 ≈ (.71, .80). We are 90% confident that between 71% and 80% of all doctors object to the sale of human organs for transplant. Caution: the margin of error accounts for random sampling error only; it does not include bias which may result from the selection process, the wording of questions, etc. Choosing a Sample Size Sometimes, before the sample has been taken, we have in mind a particular margin of error that we would like to report in our confidence interval. It is easy enough to take our expression for a conservative margin of error m = 1√ n and turn it around to solve for n in terms of m: n = 1 m2 83 Thus, if we desired a margin of error equal to .03, we would take n = 1.032 = 1111. Polling organizations often sample roughly 1000 people and report a margin of error close to 3%. If we desired a margin of error equal to .02, we would take n = 1.022 = 2500. Note that as sample size goes up, margin of error goes down. Example A New York Times article entitled Lawsuits Cast Attention on Passengers’ Blood Clots on Long Flights describes a study published in the New England Journal of Medicine in September 2001. One detail of the study is that for passengers arriving at Charles de Gaulle Airport near Paris, there were 3 cases of pulmonary embolism for 2 million passengers who traveled more than 5000 miles. We should not use this information to set up a confidence interval for the proportion of all passengers traveling more than 5000 miles who would suffer from pulmonary embolism, because the number of “successes” is too small; the distribution of sample proportion wouldn’t be normal enough to justify setting up a confidence interval based on normal critical values. Exercise: Here is an excerpt from a Pittsburgh Post-Gazette article entitled Criminal pasts cited for many city school bus drivers: State auditors checking the records of a random sample of 100 city bus drivers have found that more than a quarter of them had criminal histories. The audit also found that 26 of the drivers were never checked for child abuse histories—in Pennsylvania schools, a mandate for all employees and even some volunteers. In all, the auditors discovered 80 convictions for various offenses among the 100 sampled. Thirty-four of those incidents occurred more than ten years ago, including one rape and four drug offenses. In Pennsylvania, it’s perfectly legal for school officials to hire a bus driver with certain convictions that are more than five years old—but that doesn’t mean they should, state Auditor General Robert P. Casey Jr. said yesterday in releasing the report. “No one convicted of rape should be driving a school bus full of children,” said Casey, who also said he was disappointed with the school district’s initial response to the audit. “The General Assembly needs to look at this law,” he said. A series of problems last year with school bus drivers—including a February accident that was nearly fatal to an 8-year-old Elliott girl—prompted Casey to take a closer look at Pittsburgh’s staff of 750 drivers, he said. When his office presented their results to school officials about eight months ago, Casey said, “they were very reluctant to do anything about it,” and sent him only a brief response outlining what steps were being taken to remedy the problems... Note that the article states that about 25% in a sample of Pittsburgh school bus drivers had criminal records. Report a 98% confidence interval for the proportion of all Pittsburgh school bus drivers with criminal records. One of the conditions for our approximation is not quite met; what is it? Lecture 22 Interpreting Confidence Intervals Example Suppose the proportion p of M&Ms that are blue is unknown, and when I take a sample of 75 M&Ms to estimate p, I get p̂ = 9/75 = .12 that are blue. A 95% confidence interval for p is .12 ± 2 √ .12(.88) 75 = .12 ± .075 = (.045, .195). Tell whether each of the following is a correct interpretation of this interval: 1. The probability is 95% that the proportion of all M&Ms that are blue is between .045 and .195. No: this is the most common misinterpretation of the interval, and the word “probability” is the problem. Even though it may be unknown, population proportion p is a fixed parameter, not subject to the laws of probability. Remember that probability is the study of random behavior; it applies to random variables, not to parameters. 2. The probability is 95% that the sample proportion of blue M&Ms is between .045 and .195. No: in fact, the probability is 100% that our sample proportion p̂ is in the confidence interval, 84 because we built the interval around p̂! Remember that setting up a confidence interval is a form of statistical inference, a process whereby we use statistics to draw conclusions about parameters, and so we need to be making a statement about p, not p̂. 3. We are 95% confident that the proportion of all M&Ms that are blue is between .045 and .195. Yes. 4. The probability is 95% that the interval we produced, (.045, .195), contains p. Yes: because sample proportion p̂ varies from sample to sample, the interval built around p̂ varies ran- domly as long as the sample was random. Thus, the word “probability” does apply to the interval produced. Picture 100 students each selecting a random sample of 75 M&Ms from a large bowlful and setting up a 95% confidence interval for the proportion p of all M&Ms that are blue. Roughly 95% of those 100 intervals, that is, 95 of the intervals, should contain p. Now imagine the students each randomly selected 75 M&Ms from a huge barrelful instead of a bowlful. Would their confidence intervals be any more or less accurate? No: population size does not enter into our calculations. It is irrelevant as long as it is at least ten times the sample size, and as long as the samples are selected at random. Using Confidence Intervals to Guide Decisions Example In a group of 371 college students, 196 wore some type of corrective lenses. 1. Give a 95% confidence interval for the proportion of all college students wearing corrective lenses. Since 196/371 = .53, our interval is .53± 2 √ .53(.47) 371 = .53 ± .05 = (.48, .58) 2. Are you convinced that a majority of students wear corrective lenses? No, because the interval (.48, .58) contains values less than .5, suggesting that the population proportion p is not necessarily greater than .5. Example In a group of 371 college students, 128 wore contact lenses. 1. Give a 95% confidence interval for the proportion of all college students wearing contact lenses. Since 128/371 = .35, our interval is .35± 2 √ .35(.65) 371 = .35 ± .05 = (.30, .40) 2. Are you convinced that a minority of students wear contact lenses? Yes, because the interval (.30, .40) doesn’t even come close to containing proportions of .5 or more. Example 32/233 = .137 of the 233 females in a group of college students wore glasses whereas 36/138 = .261 of the 138 males wore glasses. Compare the confidence intervals for population proportions of females and of males wearing glasses in order to decide if these population proportions could be equal. For the females, a 95% confidence interval for p is .137±2 √ .137(.863) 233 = .137± .045 = (.092, .182). 85