Confidence Interval for a Proportion - Lecture Slides | STAT 541, Study notes of Statistics

Material Type: Notes; Professor: Davenport; Class: APPLIED STAT FOR ENGINR & SCI; Subject: Statistics; University: Virginia Commonwealth University; Term: Unknown 1989;

Typology: Study notes

Pre 2010

Uploaded on 02/12/2009

koofers-user-570
koofers-user-570 🇺🇸

10 documents

1 / 19

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1
Lecture 12 -
VCU's Stat 541
James M. Davenport,
Copyright 2008
1
Virginia
Commonwealth
University
STAT 541
APPLIED STATISTICS FOR
ENGINEERS & SCIENTISTS
Instructor: Dr. James M. Davenport
Lecture # 12
Lecture 12 -
VCU's Stat 541
James M. Davenport,
Copyright 2008
2
Today’s Lecture
Information in today’s lecture corresponds to
the following sections in your textbook (in
addition to my notes):
5.4, 8.1 & 10.1: Large Sample Confidence Intervals for
Proportions, Confidence Intervals on Means with Unknown
Variance, Intro. to the Chi-Square Distribution & Student’s T
Distribution, & confidence intervals for the Mean Using Small
Samples
Lecture 12 -
VCU's Stat 541
James M. Davenport,
Copyright 2008
3
Central Limit Theorem
Application
We can now use the Central Limit
Theorem to construct a large sample
confidence interval for an unknown
proportion.
Lecture 12 -
VCU's Stat 541
James M. Davenport,
Copyright 2008
4
Confidence Interval for a
Proportion, p
Is approximately distributed as N( 0 , 1 ) .
() ()
ˆ
11
Yp
Ynp pp
n
Znpq pp pp
nn
−−
== =
−−
Lecture 12 -
VCU's Stat 541
James M. Davenport,
Copyright 2008
5
Confidence Interval for a
Proportion, p
The resulting confidence interval for p is
Note that this is a function of the
unknown proportion, p .
() ()
2
1
ˆpp
pz n
α
±
Lecture 12 -
VCU's Stat 541
James M. Davenport,
Copyright 2008
6
Confidence Interval for a
Proportion, p
If we solve this inequality for the
unknown proportion, p, we obtain the
following for the confidence limits.
(
)
()
22
22 2
22
ˆˆˆ
24
2
n
p
zzn
pq
z
LCL nz
αα α
α
+− +
=+
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13

Partial preview of the text

Download Confidence Interval for a Proportion - Lecture Slides | STAT 541 and more Study notes Statistics in PDF only on Docsity!

Lecture 12 -VCU's Stat 541 James M. Davenport,Copyright 2008 1

Virginia

Commonwealth

University

STAT 541

APPLIED STATISTICS FOR

ENGINEERS & SCIENTISTS

Instructor: Dr. James M. Davenport Lecture # 12 Lecture 12 -VCU's Stat 541 James M. Davenport,Copyright 2008 2

Today’s Lecture

Information in today’s lecture corresponds to the following sections in your textbook (in addition to my notes):

  • 5.4, 8.1 & 10.1: Large Sample Confidence Intervals for Proportions, Confidence Intervals on Means with Unknown Variance, Intro. to the Chi-Square Distribution & Student’s T Distribution, & confidence intervals for the Mean Using Small Samples

Lecture 12 -VCU's Stat 541 James M. Davenport,Copyright 2008 3

Central Limit Theorem

Application

We can now use the Central Limit Theorem to construct a large sample confidence interval for an unknown proportion.

Lecture 12 -VCU's Stat 541 James M. Davenport,Copyright 2008 4

Confidence Interval for a

Proportion, p

Is approximately distributed as N( 0 , 1 ).

( ) ( )

Y

p

Y np n p p

Z

npq p p p p

n n

Lecture 12 -VCU's Stat 541 James M. Davenport,Copyright 2008 5

Confidence Interval for a

Proportion, p

The resulting confidence interval for p is

Note that this is a function of the unknown proportion, p.

( )

( ) 2

p p

p z

α n

Lecture 12 -VCU's Stat 541 James M. Davenport,Copyright 2008 6

Confidence Interval for a Proportion, p

If we solve this inequality for the unknown proportion, p, we obtain the following for the confidence limits.

( ) ( )

2 2 2 2 2 2 2

np z z npq z LCL n z

α α α α

Lecture 12 -VCU's Stat 541 James M. Davenport,Copyright 2008 7

Confidence Interval for a Proportion, p

The upper confidence limit is as follows:

These agree with those given in several texts, and should be used for small n – sometimes called the Wilson Estimators

( ) ( )

2 2 2 2 2 2 2

2 ˆ^4 ˆ ˆ

np z z npq z UCL n z

α α α α

Lecture 12 -VCU's Stat 541 James M. Davenport,Copyright 2008 8

Sample Size for a Confidence Interval for a Proportion, p

The sample size formula is given by:

where L is the width of the interval.

(^2 2 2 2 2 4 2) ( (^2) ) (^2 ) 2

2 z ˆ ˆ pq^ z L 4 z ˆ ˆ pq^ pq ˆ ˆ L L z n L

α −^ α ±^ α −^ + α

Lecture 12 -VCU's Stat 541 James M. Davenport,Copyright 2008 9

Confidence Interval for a

Proportion, p

If the sample size n is large, these limits reduce to the following:

where This is an approximate (100)( 1 - α )% confidence interval for p.

2

pq

p z

α n

± q ˆ = 1 − ˆ p

Lecture 12 -VCU's Stat 541 James M. Davenport,Copyright 2008 10

Sample Size for a Confidence Interval for a Proportion, p

And the sample size formula reduces to:

where L is the width of the interval.

2 2 2

4 z pq ˆ ˆ n L

=^ α

Lecture 12 -VCU's Stat 541 James M. Davenport,Copyright 2008 11

Confidence Intervals

Using Estimates of :

Large Sample Results

  • What we discussed in the previous lecture requires that be known.
  • What if is unknown?

σ^2

σ^2

σ^2

Lecture 12 -VCU's Stat 541 James M. Davenport,Copyright 2008 12

Confidence Intervals

Using Estimates of :

Large Sample Results

At this point, the only thing we can do is simple substitute an estimate of , namely s^2 , and hence use

σ^2

σ^2

2

s

x z

n

Lecture 12 -VCU's Stat 541 James M. Davenport,Copyright 2008 19

Confidence Intervals

Using Estimates of :

Large Sample Results

  • Also note that if we use s in place of , then the length of the confidence interval will also be a random quantity. Hence, if we determine the sample size from a “known” or “historical value” for , and compute the interval using the sample standard deviation s, then the confidence interval will most likely not be of length L.

σ^2

Lecture 12 -VCU's Stat 541 James M. Davenport,Copyright 2008 20

Confidence Intervals for

using s

What we really must investigate is the sampling distribution of this statistic:

as opposed to

First done by W. S. Gosset.

X

T

s n

X

Z

n

μ σ

Lecture 12 -VCU's Stat 541 James M. Davenport,Copyright 2008 21

The Sampling

Distribution of S 2

But before we can adequately do that, we need to introduce the sampling distribution of S^2 (the sample variance), that are computed from random samples X 1 , X 2 ,... , Xn that arise from NORMAL DISTRIBUTIONS N( μ , σ^2 ).

Lecture 12 -VCU's Stat 541 James M. Davenport,Copyright 2008 22

The Sampling

Distribution of S 2

That is, the sampling distribution of the random variable defined by

2 2 1

n i i

S X X

n =

Lecture 12 -VCU's Stat 541 James M. Davenport,Copyright 2008 23

The Sampling

Distribution of S 2

We can prove that the distribution of W has a very special sampling distribution called the Chi-square distribution with ( n – 1 ) degrees of freedom, where

2 2 1 2 2

n i i

X X

n S

W

=

Lecture 12 -VCU's Stat 541 James M. Davenport,Copyright 2008 24

The Sampling

Distribution of S 2

The density function of W is given by

where r = the degrees of freedom.

1 2 2 2

r w

f w r r^ w^ e^ w

elsewhere

⎪⎜ ⎟ <^ < ∞

= ⎨⎜^ Γ ⎟

Lecture 12 -VCU's Stat 541 James M. Davenport,Copyright 2008 25

The Chi-square

Distribution

  • The parameter r denotes the degrees of freedom.
  • For brevity, we write … " W has Χ^2 r "

Lecture 12 -VCU's Stat 541 James M. Davenport,Copyright 2008 26

Lecture 12 -VCU's Stat 541 James M. Davenport,Copyright 2008 27

THE CHI-SQUARE CURVE Plot of a Chi-square p.d.f.

w Variable with r Degrees of Freedom

Density

Lecture 12 -VCU's Stat 541 James M. Davenport,Copyright 2008 28

Percentage Points of the

Chi-square Distribution

The percentage points for selected probabilities such that

are given in Table II in your textbook.

2 χ α , r α 2 P W ⎡⎣^ ≥ χ α (^) , r ⎤⎦= α for 0 < α< 1

Lecture 12 -VCU's Stat 541 James M. Davenport,Copyright 2008 29

THE CHI-SQUARE CURVE Plot of a Chi-square p.d.f.

w Variable with r Degrees of Freedom

Density

α

2 χ α , r

1 − α

Lecture 12 -VCU's Stat 541 James M. Davenport,Copyright 2008 30

Percentage Points of the

Chi-square Distribution

These percentage points can be found using NCSS’s Probability Calculator, along with the cumulative probabilities of the Chi-square distribution.

Lecture 12 -VCU's Stat 541 James M. Davenport,Copyright 2008 37

S is Unbiased for?

  • So S^2 is unbiased for.
  • Is S unbiased for?
  • The answer is NO!
  • S is a biased estimator of. But the bias is not great, and we use S as THE ESTIMATOR of.

σ

σ^2 σ

σ

σ Lecture 12 -VCU's Stat 541 James M. Davenport,Copyright 2008 38

Confidence Interval for

with small sample sizes

  • If the sample size is small to moderate in size and the variance is unknown, what do we do?
  • Simply substituting S for is fine provided the sample size is large enough for S to be a reasonably good estimator of.

μ

σ^2

Lecture 12 -VCU's Stat 541 James M. Davenport,Copyright 2008 39

Small Sample Conf. Int.

This was the starting place for deriving the confidence interval for μ, assuming σ is known.

X

P z z

n

α α

μ α σ

⎢ −^ ≤^ ≤^ ⎥=^ −

Lecture 12 -VCU's Stat 541 James M. Davenport,Copyright 2008 40

Small Sample Conf. Int.

But if we substitute s for σ and the sample size is small to moderate in size,

then the percentiles z α 2 are not correct.

Lecture 12 -VCU's Stat 541 James M. Davenport,Copyright 2008 41

Small Sample Conf. Int.

These percentiles are not correct, since for small sample sizes, the sampling distribution of is no longer adequately described by the normal distribution, .

X

2

N ,

n

σ μ

Lecture 12 -VCU's Stat 541 James M. Davenport,Copyright 2008 42

Small Sample Conf. Int.

If we substitute s for σ, and we wish to maintain a valid probability statement with probability 1 - α , then we must use

a different percentage point, not z α 2.

Lecture 12 -VCU's Stat 541 James M. Davenport,Copyright 2008 43

The T Statistic

What we really must investigate is the sampling distribution of the following statistic:

X

T

S

n

− μ

Lecture 12 -VCU's Stat 541 James M. Davenport,Copyright 2008 44

Lecture 12 -VCU's Stat 541 James M. Davenport,Copyright 2008 45 Lecture 12 -VCU's Stat 541 James M. Davenport,Copyright 2008 46

Lecture 12 -VCU's Stat 541 James M. Davenport,Copyright 2008 47

T Ratio Can Be Written

As Follows

.

X n

T

S n n

μ σ σ

= ×

( ) ( ) (^ )

( )

( )

2 2 1 2

n

X

n Z N

n S W

n n^ n

μ σ χ σ

− −^ −

Lecture 12 -VCU's Stat 541 James M. Davenport,Copyright 2008 48

Student’s T Distribution

The random variable T has a very special sampling distribution which is called the “Student’s T” distribution with r = (n – 1) degrees of freedom.

Lecture 12 -VCU's Stat 541 James M. Davenport,Copyright 2008 55

Percentage Points of the

Student’s T

n = 2 r = n – 1 = 1 α = 0. Find t (^) 0.025 , 1 such that

From Table IV, t0.025 , 1 = 12.

By contrast z0.025 = 1.

P T ⎡⎣^ ≥ t 0.025,1 ⎤⎦= 0.

Lecture 12 -VCU's Stat 541 James M. Davenport,Copyright 2008 56

THE STUDENT's T CURVE Plot of Student's t p.d.f.

Density

t Variable

t (^) 0.025,1 = 12.

α = 0.

Lecture 12 -VCU's Stat 541 James M. Davenport,Copyright 2008 57

Percentage Points of the

Student’s T

And, of course, these percentage points can be found for any value of the cumulative or upper tail probability and for any degree of freedom (including non integer degrees of freedom) using NCSS’s Probability Calculator. Lecture 12 -VCU's Stat 541 James M. Davenport,Copyright 2008 58

THE STUDENT's T CURVE Plot of Student's t p.d.f.

Density

t Variable

t 0.025,4 = − 2.776 t 0.025,4 (^) = 2.

0.025 0.

n = 5 ; r = 4

Lecture 12 -VCU's Stat 541 James M. Davenport,Copyright 2008 59

Moments of Student’s T

• E[ T ] = 0

  • Var[ T ] = r / ( r – 2 )

Lecture 12 -VCU's Stat 541 James M. Davenport,Copyright 2008 60

Exercise

Find d 1 such that P[ T > d 1 ] = 0. (or equivalently, P[ T < d 1 ] = 0.95 ) for degrees of freedom r = 11 From NCSS’s Probability Calculator, we find d 1 = 1..

Lecture 12 -VCU's Stat 541 James M. Davenport,Copyright 2008 61

THE STUDENT's T CURVE Plot of Student's t p.d.f.

Density

t Variable

d 1 (^) = 1.

0.

n = 12 ; r = 11

0.

Lecture 12 -VCU's Stat 541 James M. Davenport,Copyright 2008 62

Exercise

Let r = 11. Find d 2 such that P[ - d 2 < T < d 2 ] = 0..

1 - α = 0.95 α = 0.05 α / 2 = 0.

This implies that d 2 = 2..

Lecture 12 -VCU's Stat 541 James M. Davenport,Copyright 2008 63

THE STUDENT's T CURVE Plot of Student's t p.d.f.

Density

t Variable

d (^) 2 = − t (^) 0.025,11 = − 2.201 d (^) 2 = t 0.025,11 = 2.

0.025 0.

n = 12 ; r = 11

Lecture 12 -VCU's Stat 541 James M. Davenport,Copyright 2008 64

Confidence Interval for

the Mean μ using S 2

is a (100)( 1 - α )% confidence interval for the mean μ. This of course, assumes that we are sampling from a N( μ , σ^2 ) pop.

2, r

S

X t

n

Lecture 12 -VCU's Stat 541 James M. Davenport,Copyright 2008 65

Example

The mean tearing strength of a certain brand of paper in under investigation by a manufacturer of laser printers. We assume that this measurement is normally distributed. A random sample of n = 22 sheets of paper were tested and the sample mean tearing strength was 2.4 pounds Lecture 12 -VCU's Stat 541 James M. Davenport,Copyright 2008 66

Example: known σ^2

1 - α = 0.95 α = 0.05 α / 2 = 0. X = tearing strength of paper (pounds)

X has N( μ , σ^2 = 0.04 ) n = 22 x = 2.

2^ (^ )^ (^ )

x z n α

σ ⎛ ⎞ ± = ± (^) ⎜ ⎟= ⎝ ⎠

Lecture 12 -VCU's Stat 541 James M. Davenport,Copyright 2008 73

When do you use the

Student’s T percentage

points?

Hence, if that population is relatively mound-shaped and somewhat symmetric, then use the percentage points from the Student’s T distribution. Lecture 12 -VCU's Stat 541 James M. Davenport,Copyright 2008 74

When do you use the

Student’s T percentage

points?

If you find yourself in situations where the original population is decidedly non- normally distributed and the sample size is small, then you must use other methods that we will not discuss here (exact sampling dist. & non parametric methods).

Using NCSS’s Probability Calculator^ to find the cumulative probability and p – value for a

Chi-square distribution with 2 degrees of freedom.

Enter the degrees offreedom of the Chi-square distributionThis is the non-centrality parameter.

Enter the value, say

, of

the Chi-square distributionwith degrees of freedomgiven above at left.When computing p-values,this is the observed value ofthe test statistic computedfrom the data.

This output window provides thecumulative probability associatedwith the value,

, entered at the

lower left; i.e. P [ x <

].

For a lower tailed test, this willbe the p – value.

This is the complement of theprobability given above; i.e.1 – P [ x <

] = P [ x >

].

For an upper tailed test, thiswill be the p – value.

Note: these two probabilitiessum to one.

For a two tailed test, you must double the appropriate value givenabove at right; i.e. p – value = 2 (.001526 ) =..

Using NCSS's Probability Calculator to find the cumulative probability and p - value for aStudent's t distribution with 18 degrees of freedom.

Enter the degrees of freedomof the Student's t distribution.The is the non-centralityparameter, which for most ofthe applications in anintroductory course in statistics,will be equal to zero.Enter the value, say T, of theStudent's t distribution withdegrees of freedom givenabove at left.When computing p-values,this is the observed value ofthe test statistic computedfrom the data.

This output window provides thecumulative probability associated withthe value, T, entered at the lower left;i.e. P [ t < T ] .For a lower tailed test, this will be thep-value.

This is the complement of theprobability given above; i.e.1 - P [ t < T ] = P [ t > T ]For an upper tailed test, this will bethe p-value. Note: these two probabilitiessum to one.

For a two tailed test, you must double the appropriate value givenabove at right; i.e. p -value = 2 (0.0304 ) = 0..

Using NCSS’s Probability Calculator

to fnd the 95

th^

percentile from a

Student’s t-distribution with 18 degrees of freedom.

Output is the percentagepoint from the Student’st – distribution with thedegrees of freedom atupper left and thecumulative probabilitygiven at lower left.

Enter the degrees offreedom for the Student’st – distribution. This window is for thenon-centrality parameter. Enter the cumulativeprobability correspondingto the percentage pointdesired.

symmetric. 2), As the sample size grows, assumption (b) is even less important; the Central Limit Theorem tell us that the sampling distribution of the mean approaches the normal regardless of the underlying population distribution With a large sample, the assumption of underlying Normality is not important at all.

  1. No amount of data will make up for a failure of assumption (a). A biased sample or experiment without randomization cannot be fixed by getting still more biased data.
  2. As the df grow, the difference between using t and using the Normal becomes indiscernible. In fact, for well- behaved data the difference in p-values and confidence interval width is negligible around 15 df or above.
  3. In practice, statisticians and researchers using statistics find p-values from the t distribution using computer statistics packages and never (well, hardly ever) refer to the normal distribution or tables of any sort.

-- Paul Velleman

P.S. In ActivStats, our multimedia materials for the introductory statistics course, we do not state any arbitrary rule for df and instead advise students to follow the advice I give here. It is what statisticians do in practice.