Inferring Population Proportions: Large Sample Confidence Intervals and Significance Tests, Exams of Statistics

The concepts of inferring population proportions using large sample confidence intervals and significance tests. It covers the calculation of sample proportions, the sampling distribution of proportions, and the conditions for inference. The document also includes an example of calculating a confidence interval for the proportion of arthritis patients experiencing adverse symptoms from a medication.

Typology: Exams

Pre 2010

Uploaded on 08/31/2009

koofers-user-2j9
koofers-user-2j9 🇺🇸

9 documents

1 / 20

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Inference about a population
proportion
BPS chapter 20
© 2006 W.H. Freeman and Company
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14

Partial preview of the text

Download Inferring Population Proportions: Large Sample Confidence Intervals and Significance Tests and more Exams Statistics in PDF only on Docsity!

Inference about a populationproportion

BPS chapter 20

© 2006 W.H. Freeman and Company

Objectives (BPS chapter 20) Inference for a population proportion ‡^

The sample proportion ‡^

The sampling distribution of ‡^

Large sample confidence interval for

p

‡^

Accurate confidence intervals for

p

‡^

Choosing the sample size ‡^

Significance tests for a proportion

ˆ p

ˆ p

‡^

We choose 50 people in an undergrad class, and find that 10 of them areHispanic:

(proportion of Hispanics in sample)

‡^

You treat a group of 120 Herpes patients given a new drug; 30 get better:

=^

(proportion of patients improving in sample)

The sample proportion We now study categorical data and draw inference on the proportion, orpercentage, of the population with a specific characteristic.If we call a given categorical characteristic in the population “success,”then the sample proportion of successes,

,is:

sample

in the ns

observatio of

count

sample

in the

successes of

count

ˆ^

= p

ˆ p ˆ p

ˆ p ˆ p

Sampling distribution of The sampling distribution of

is never exactly normal. But as the

sample size increases, the sampling distribution of

becomes

approximately normal.

p ˆ

ˆ p^ p ˆ

Conditions for inference on

p

Assumptions: 1.

We regard our data as a

simple random sample

(SRS) from the

population. That is, as usual, the most important condition.

The

sample size

n

is large enough

that the sampling distribution is

indeed normal. How large a sample size is enough?

Different inference procedures

require different answers (

we’ll see what to do practically

Large-sample confidence interval for

p

Use this method when the numberof successes and the number offailures are both at least 15.

C

Z *
− Z

m *

m

Confidence intervals contain the population proportion

p

in C% of

samples. For an SRS of size

n

drawn from a large population and with

sample proportion

calculated from the data, an

approximate level

C

confidence interval

for

p

is:

C is the area under the standardnormal curve between

z * and

z

n p

p

z

SE z

m

m m p

ˆ^ )

(^1) ( ˆ

error of

margin the is

,

ˆ

=

± =

ˆ p

Upper tail probability P

z*^

50%

60%

70%

80%

90%

95%

96%

98%

Confidence level C

Let’s calculate a 90% confidence interval for the population proportion ofarthritis patients who suffer some “adverse symptoms.”What is the sample proportion

ˆ^

n p

p p N p^

m m

n

p

p

z

m

ˆ^

p

What is the sampling distribution for the proportion of arthritis patients withadverse symptoms for samples of 440?For a 90% confidence level,

z

Using the large sample method, wecalculate a margin of error

m

Î

With 90% confidence level, between 2.9% and 7.5% of arthritis patients taking this pain medication experience some adverse symptoms.

or

for

CI

±^

m

p

p

ˆ p

Because we have to use an estimate of

p

to compute the margin of

error, confidence intervals for a population proportion are not veryaccurate.

m

=

z

ˆ p (

−^

ˆ p ) n

Specifically, the actual confidence interval is usually less than the confidencelevel you asked for in choosing z*. But there is no systematic amount(because it depends on p).

Use with caution!

We now use the “plus four” method to calculate the 90% confidenceinterval for the population proportion of arthritis patients who suffersome “adverse symptoms.”

(^018). 0

(^011). (^0) *

(^645). 1

(^444) / ) (^056). 0 (^1) ( (^056). 0

(^645). 1

) 4

( ~) (^1) ( ~

=

=

= m m

n p

p

z m

An approximate 90% confidence interval for

p

using the “plus four” method is:

Upper tail probability P

z*^

50%

60%

70%

80%

90%

95%

96%

98%

99%

99.5%

99.8%

99.9%

Confidence level C

Î

With 90% confidence level, between 3.8% and 7.4% of arthritis patients taking this pain medication experience some adverse symptoms.

(^018). 0

(^056). 0 or

~ :

for CI % 90

±

±^

m p p

~^

p

What is the value of the “plus four” estimate of

p

Choosing the sample size You may need to choose a sample size large enough to achieve aspecified margin of error. However, because the sampling distributionof

is a function of the population proportion

p

this process requires

that you guess a likely value for

p

:^ p

The margin of error will be less than or equal to

m

if

p*

is chosen to be 0.5.

Remember, though, that sample size is not always stretchable at will. There aretypically costs and constraints associated with large samples.

(^

)^

2

p

p

z m

m n p p p N p

ˆ p

Significance test for

p

The sampling distribution for

is approximately normal for large

sample sizes, and its shape depends solely on

p

and

n

Thus, we can easily test the null hypothesis: H

p = p

0

(a given value we are testing)

z^

=

ˆ p^

p

0

p^0

(

p

) 0

n

If H

is true, the sampling distribution is known 0

Æ

The likelihood of our sample proportion given thenull hypothesis depends on how far from p

our p^ 0

is in units of standard deviation. This is valid when both expected counts — expected successes

np

and 0

expected failures

n

p

) — are each 10 or larger. 0

p^0

(

−^

p^0

)

n

p^0 ˆ p

ˆ p

P-values and one- or two-sided hypotheses

reminder

And as always, if the

P

-value is smaller than the chosen significance

level

α

, then the difference is statistically significant and we reject

H

From Table A we find the area to the left of z 1.62 is 0.9474.Thus

P(Z
.^62
)^ = 1

. 9474, or 0.0526. Since the alternative hypothesis is

two-sided, the

P

-value is the area in both tails, and

P
= 2 × 0
Î

The chain restaurant data are compatible with the nationalsurvey results (

z

P

ˆˆˆ ppp

ˆ p

Interpretation: magnitude versus reliability of effects^ The

reliability

of an interpretation is related to the strength of the

evidence. The smaller the

P

-value,

the stronger the evidence against

the null hypothesis and the more confident you can be about yourinterpretation.The

magnitude

or

size

of an effect relates to the real-life relevance of

the phenomenon uncovered. The

P

-value does NOT assess the

relevance of the effect, nor its magnitude.A

confidence interval

will assess the magnitude of the effect.

However, magnitude is not necessarily equivalent to how theoreticallyor practically relevant an effect is.