Statistics: Simple Random Sampling and Estimation of Population Mean and Proportion, Exams of Design history

The concept of simple random sampling and provides formulas for estimating population mean and proportion. It covers topics such as variance estimation, confidence intervals, and sample size determination.

Typology: Exams

2021/2022

Uploaded on 08/01/2022

hal_s95
hal_s95 🇵🇭

4.4

(655)

10K documents

1 / 23

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
2.3 Simple Random Sampling
Simple random sampling without replacement (srswor) of size nis the probability
sampling design for which a fixed number of nunits are selected from a population of N
units without replacement such that every possible sample of nunits has equal probability
of being selected. A resulting sample is called a simple random sample or srs.
Note: I will use SRS to denote a simple random sample and SR as an abbreviation of
‘simple random’.
Some necessary combinatorial notation:
(nfactorial) n! = n×(n1) ×(n2) × ··· × 2×1.This is the number of
unique arrangements or orderings (or permutations) of ndistinct items. For example:
6! = 6 ×5×4×3×2×1 = 720.
(N choose n) N
n=N(N1) ·· ·(Nn+ 1)
n!=N!
n!(Nn)!.This is the
number of combinations of nitems selected from Ndistinct items (and the order of
selection doesn’t matter). For example, 6
2=6!
2!4! =(6)(5)(4!)
2!4! =(6)(5)
(2)(1) = 15.
There are N
npossible SRSs of size nselected from a population of size N.
For any SRS of size nfrom a population of size N, we have P(S) = 1/N
n.
Unless otherwise specified, we will assume sampling is without replacement.
2.3.1 Estimation of yUand t
A natural estimator for the population mean yUis the sample mean y. Because yis an
estimate of an individual unit’s y-value, multiplication by the population size Nwill give
us an estimate b
tof the population total t. That is:
c
yU=y=1
n
n
X
i=1
yib
t=N
n
n
X
i=1
yi= (10)
c
yUand b
tare design unbiased. That is, the average values of yand N y taken over all
possible SRSs equal yUand t, respectively.
Demonstration of Unbiasedness: Suppose we have a population consisting of five y-values:
Unit i12345
yi02347
which has the following parameters:
N=t=yU=S2=S
Suppose a SRS of size n= 2 is selected. Then P(S) = 1/5
2= 1/10 for each of the 10 possible
SRSs.
22
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17

Partial preview of the text

Download Statistics: Simple Random Sampling and Estimation of Population Mean and Proportion and more Exams Design history in PDF only on Docsity!

2.3 Simple Random Sampling

• Simple random sampling without replacement (srswor) of size n is the probability

sampling design for which a fixed number of n units are selected from a population of N

units without replacement such that every possible sample of n units has equal probability

of being selected. A resulting sample is called a simple random sample or srs.

• Note: I will use SRS to denote a simple random sample and SR as an abbreviation of

‘simple random’.

• Some necessary combinatorial notation:

– (n factorial) n! = n × (n − 1) × (n − 2) × · · · × 2 × 1. This is the number of

unique arrangements or orderings (or permutations) of n distinct items. For example:

6! = 6 × 5 × 4 × 3 × 2 × 1 = 720.

– (N choose n)

N

n

N (N − 1) · · · (N − n + 1)

n!

N!

n!(N − n)!

. This is the

number of combinations of n items selected from N distinct items (and the order of

selection doesn’t matter). For example,

• There are

(N

n

possible SRSs of size n selected from a population of size N.

• For any SRS of size n from a population of size N , we have P (S) = 1/

(N

n

• Unless otherwise specified, we will assume sampling is without replacement.

2.3.1 Estimation of yU and t

• A natural estimator for the population mean yU is the sample mean y. Because y is an

estimate of an individual unit’s y-value, multiplication by the population size N will give

us an estimate ̂t of the population total t. That is:

ŷU = y = 1

n

∑^ n

i=

yi ̂t =

N

n

∑^ n

i=

yi = (10)

• ̂yU and ̂t are design unbiased. That is, the average values of y and N y taken over all

possible SRSs equal yU and t, respectively.

Demonstration of Unbiasedness: Suppose we have a population consisting of five y-values:

Unit i 1 2 3 4 5

yi 0 2 3 4 7

which has the following parameters:

N = t = yU = S^2 = S ≈

Suppose a SRS of size n = 2 is selected. Then P (S) = 1/

2

= 1/10 for each of the 10 possible

SRSs.

All Possible Samples and Statistics from Example Population

Sample Units y-values

yi ̂yU = y ̂t = N y Ŝ^2 = s^2 Ŝ = s

S 1 1,2 0,2 2 1 5 2 1.

S 2 1,3 0,3 3 1.5 7.5 4.5 2.

S 3 1,4 0,4 4 2 10 8 2.

S 4 1,5 0,7 7 3.5 17.5 24.5 4.

S 5 2,3 2,3 5 2.5 12.5 .5 0.

S 6 2,4 2,4 6 3 15 2 1.

S 7 2,5 2,7 9 4.5 22.5 12.5 3.

S 8 3,4 3,4 7 3.5 17.5 .5 0.

S 9 3,5 3,7 10 5 25 8 2.

S 10 4,5 4,7 11 5.5 27.5 4.5 2.

Column Sum 32 160 67 22.

Expected value 3210 = 3. 2 16010 = 16 6710 = 6. 7 22. 106274 = 2. 26274

= E(estimator) = yU = t = S^2 6 = S

The averages for estimators ŷU = y, ̂t = N y, and Ŝ^2 = s^2 equal the parameters that they

are estimating. This implies that y, N y, and s^2 are unbiased estimators of yU , t, and S^2.

Notation: E(ŷ U ) = yU , E(̂t) = t, E(Ŝ^2 ) = S^2 or E(y) = yU , E(N y) = t, E(s^2 ) = S^2.

The average for estimator Ŝ = s does not equal the parameter S. This implies that s is a

biased estimator of S. Notation: E(Ŝ ) 6 = S or E(s) 6 = S.

  • The next problem is to study the variances of ŷU = y and ̂t = N y.
  • Warning: In an introductory statistics course, you were told that the variance of the sample

mean V (Y ) = S^2 /n (= σ^2 /n) and its standard deviation is S/

n (= σ/

n). This is

appropriate if a sample was to be taken from an infinite or extremely large population.

  • However, we are dealing with finite populations that often are not considered extremely

large. In such cases, we have to adjust our variance formulas by

N − n

N

which is known

as the finite population correction (f.p.c.).

  • Texts may rewrite the f.p.c.

N − n

N

as either 1 −

n

N

or 1 − f where f = n/N is the

fraction of the population that was sampled. By definition :

V ( ŷ U ) = V (y) = V (̂t) = N 2 V (y) = N (N − n)

S^2

n

  • Because S^2 is unknown, we use s^2 to get unbiased estimators of the variances in (11)::

V̂ (̂yU ) = V̂ (y) = V̂ (̂t) = N 2 V̂ (y) = N (N − n)s

2

n

  • Taking a square root of a variance in (11) yields the standard deviation of the estimator.
  • Taking a square root of an estimated variance in (12) yields the standard error of the

estimate.

2.3.2 SRS With Replacement

• Consider a sampling procedure in which a sampling unit is randomly selected from the

population, its y-value recorded, and is then returned to the population. This process of

randomly selecting units with replacement after each stage is repeated n times. Thus, a

sampling unit may be sampled multiple times. A sample of n units selected by such a

procedure is called a simple random sample with replacement.

• The estimators for SRS with replacement are: ŷU = y V̂ (ŷU ) = V̂ (̂ y) =

s^2

n

• Suppose we have two estimators ̂θ 1 and ̂θ 2 of some parameter θ.

̂ θ 1 is less efficient than θ̂ 2 for estimating θ if V (θ̂ 1 ) > V ( θ̂ 2 ).

̂ θ 1 is more efficient than θ̂ 2 for estimating θ if V ( θ̂ 1 ) < V ( θ̂ 2 ).

• For most situations, the estimator for a SRS with replacement is less efficient than the

estimator for a SRS without replacement.

• There will be circumstances (such as sampling proportional to size) where we will consider

sampling with replacement. Unless otherwise stated, we assume that sampling is done

without replacement.

2.4 Two-Sided Confidence Intervals for yU and t

• In an introductory statistics course, you were given confidence interval formulas

y ± z∗^

s

n

and y ± t∗^

s

n

These formulas are applicable if a sample was to be taken from an infinitely or extremely

large population. But when we are dealing with finite populations, we adjust our variance

formulas by the finite population correction.

• In the finite population version of the Central Limit Theorem, we assume the estimators

̂ yU = y and ̂t = N y have sampling distributions that are approximately normal. That is,

ŷ U ∼˙ N

yU ,

N − n

N

S^2

n

and ̂t ∼˙ N

t , N (N − n)

S^2

n

• For large samples, approximate 100(1 − α)% confidence intervals for yU (μ) and t (τ ) are

For yU : For t : (14)

y ± z∗

N − n

N

s^2

n

N y ± z∗

N (N − n)

s^2

n

y ± z∗s

N − n

N

/n N y ± z∗s

N (N − n)/n (15)

where z∗^ is the upper α/2 critical value from the standard normal distribution. Or, in

standard error (s.e.) notation,

ŷ U ± ̂t ±

For 90%, 95%, and 99%, z∗^ = 1. 645 , 1. 96 , and 2.576, respectively.

  • For smaller samples, approximate 100(1 − α)% confidence intervals for yU and t are

For yU : For t : (16)

y ± t∗

N − n

N

s^2

n

N y ± t∗

N (N − n)

s^2

n

y ± t∗s

N − n

N

/n N y ± t∗s

N (N − n)/n (17)

where t∗^ is the upper α/2 critical value from the t(n − 1) distribution.

  • The quantity being added and subtracted from ̂yU = y or ̂t = N y in the confidence

interval is known as the margin of error.

Example: Use the small population data again. For n = 2, t∗^ ≈ 6 .314 for a nominal 90%

confidence level.

All Possible Samples and Confidence Intervals from Example Population Sample y-values ∑ yi ŷU = y ̂t = N y Ŝ^2 = s^2 Ŝ = s V̂ (ŷU ) V̂ (̂ t) 90% ci for t 1 0,2 2 1 5 2 1.4142 0.6 15 (-19.45, 29.45) 2 0,3 3 1.5 7.5 4.5 2.1213 1.35 33.75 (-29.18, 44.18) 3 0,4 4 2 10 8 2.8284 2.4 60 (-38.91, 58.91) 4 0,7 7 3.5 17.5 24.5 4.9497 7.35 183.75 (-68.09, 103.09) 5 2,3 5 2.5 12.5 .5 0.7071 0.15 3.75 (0.27, 24.73) 6 2,4 6 3 15 2 1.4142 0.6 15 (-9.45, 39.45) 7 2,7 9 4.5 22.5 12.5 3.5355 3.75 93.75 (-38.63, 83.63) 8 3,4 7 3.5 17.5 .5 0.7071 0.15 3.75 (5.27, 29.73) 9 3,7 10 5 25 8 2.8284 2.4 60 (-23.91, 73.91) 10 4,7 11 5.5 27.5 4.5 2.1213 1.35 33.75 (-9.18, 64.18)

2.4.1 One-Sided Confidence Intervals for yU and t

  • Occasionally, a researcher may want a one-sided confidence interval. There are two types

of one-sided confidence intervals: upper and lower.

  • Approximate upper and lower 100(1 − α)% confidence intervals for yU and t are:

For yU : For t :

y − t∗s

N − n

N

/n , ∞

N y − t∗s

N (N − n)/n , ∞

upper

−∞ , y + t∗s

N − n

N

/n

−∞ , N y + t∗s

N (N − n)/n

lower

where t∗^ is the upper α critical value from the t(n − 1) distribution.

  • If the y-values cannot be negative, replace −∞ with 0 in the lower confidence interval

formulas. If the y-values cannot be positive, replace ∞ with 0 in the upper confidence

interval formulas.

SRS taken from Figure 1 (n = 10, t = 13354, yU = 33. 385 , y = 34. 1 , s^2 = 18.32)

SRS Example using Rathbun and Cressie (1994) Data

  • To illustrate the application of simple random sampling to population total t estimation, consider the abundance data in Figure 2. The abundance counts correspond to the census data studied by Rathbun and Cressie (1994).
  • This 200 × 200 m study region is located in an old-growth forest in Thomas County, Georgia. This data represents the number of longleaf pine trees located in each quadrat. The coordinates of the 584 tree locations are given in Cressie (1991).
  • I have gridded the region into a 20 × 20 grid of 10 × 10 m quadrats. The total abundance t = 584 and the mean abundance per quadrat yU = 584/400 = 1.435. The population variance S^2 = 3.853.
  • There is only a weak spatial correlation of tree counts within the study region.
  • The pineleaf census data will be used to compare estimation properties of various sampling designs.
  • Note the two relatively large boldfaced values ( 14 and 16 ).

Figure 2

Longleaf Pine Data (Rathbun and Cressie 1994)

1 1 1 1 1 2 1 0 0 0 4 5 0 1 0 1 2 1 0 1 3 2 1 0 1 0 0 0 1 2 2 2 0 2 2 2 0 2 0 1 7 4 1 1 1 1 0 0 0 2 2 0 4 3 2 4 2 1 2 2 0 1 2 0 0 0 0 0 4 6 5 1 5 0 0 0 2 1 2 0 1 1 0 2 3 2 0 0 2 1 3 1 4 1 1 1 2 2 1 1 2 0 0 0 4 3 3 0 1 16 5 0 1 3 8 0 0 1 3 3 0 0 1 14 3 3 1 2 0 8 0 2 0 3 9 0 4 2 1 0 0 0 5 1 8 7 6 6 6 1 0 4 0 0 1 2 2 0 1 2 0 0 2 2 3 2 2 3 1 1 1 3 0 0 2 2 0 3 4 0 0 0 0 0 1 0 3 1 1 1 2 0 2 0 2 0 2 1 1 0 1 8 7 7 8 0 5 0 1 0 1 2 0 0 2 4 2 2 2 4 0 9 1 0 0 1 1 1 0 0 0 1 2 4 0 2 1 3 3 1 0 0 0 1 0 2 4 3 1 2 2 0 0 1 1 2 2 0 2 4 0 1 0 0 1 2 0 2 3 5 2 0 0 2 1 1 2 0 1 3 1 0 0 1 1 0 0 0 2 2 2 1 1 1 0 0 2 0 0 0 0 2 0 2 2 0 1 1 0 2 0 0 1 0 0 1 1 1 5 3 0 0 0 3 2 1 0 0 0 0 0 2 1 0 1 1 1 3 1 2 1 0 0 1 0 3 0 1 0 0 2 1 2 0 0 0 1 1 1 0 0 0 0 0 0 0 0 1 1 1 0 1 0 3 0 2 0 1 1 0 2 0 0 0 0 0 0 0 1 2 0 1 3 0 0 1 0 1 2 4

REFERENCES (for Figure 2 data)

Cressie, Noel (1991) Statistics for Spatial Data. Wiley, New York.

Rathbun, S.L. and Cressie, N. (1994) A space-time survival point process for a longleaf pine forest in southern Georgia. Journal of the American Statistical Association, 89 , 1164-1174.

2.4.2 Using the R Survey Package for a SRS

R Code and Output for Figure 1 SRS Analysis

"count" "fpc" <- This is the contents of the data file fig1.txt

33 400 <- The first column are the recorded responses

33 400 <- The second column is the population size N

R Code

source("c:/courses/st446/rcode/confintt.r")

# t-based confidence intervals for SRS in Figure 1

library(survey)

srsdat <- read.table("c:/courses/st446/rcode/fig1.txt", header=T)

srsdat

srs_design <- svydesign(id=~1, fpc=~fpc, data=srsdat)

srs_design

esttotal <- svytotal(~count,srs_design)

print(esttotal,digits=15)

confint.t(esttotal,degf(srs_design),level=.95)

confint.t(esttotal,degf(srs_design),level=.95,tails=’lower’)

confint.t(esttotal,degf(srs_design),level=.95,tails=’upper’)

estmean <- svymean(~count,srs_design)

print(estmean,digits=15)

confint.t(estmean,degf(srs_design),level=.95)

confint.t(estmean,degf(srs_design),level=.95,tails=’lower’)

confint.t(estmean,degf(srs_design),level=.95,tails=’upper’)

R output for t-based confidence interval for SRS

> srsdat

count fpc

Independent Sampling design

total SE

count 13640 534.

mean( count ) = 13640.

SE( count ) = 534.

Two-Tailed CI for count where alpha = 0.05 with 9 df

mean( count ) = 13640.

SE( count ) = 534.

One-Tailed (Lower) CI for count where alpha = 0.05 with 9 df

5 % upper

12659.96724 infinity

mean( count ) = 13640.

SE( count ) = 534.

One-Tailed (upper) CI for count where alpha = 0.05 with 9 df

lower 95 %

-infinity 14620.

mean SE

count 34.1 1.

mean( count ) = 34.

SE( count ) = 1.

Two-Tailed CI for count where alpha = 0.05 with 9 df

mean( count ) = 34.

SE( count ) = 1.

One-Tailed (Lower) CI for count where alpha = 0.05 with 9 df

5 % upper

31.64992 infinity

mean( count ) = 34.

SE( count ) = 1.

One-Tailed (upper) CI for count where alpha = 0.05 with 9 df

lower 95 %

-infinity 36.

mean( count ) = 620.

SE( count ) = 289.

Two-Tailed CI for count where alpha = 0.05 with 19 df

mean( count ) = 620.

SE( count ) = 289.

One-Tailed (Lower) CI for count where alpha = 0.05 with 19 df

5 % upper

120.10415 infinity

mean( count ) = 620.

SE( count ) = 289.

One-Tailed (upper) CI for count where alpha = 0.05 with 19 df

lower 95 %

-infinity 1119.

mean SE

count 1.55 0.

mean( count ) = 1.

SE( count ) = 0.

Two-Tailed CI for count where alpha = 0.05 with 19 df

mean( count ) = 1.

SE( count ) = 0.

One-Tailed (Lower) CI for count where alpha = 0.05 with 19 df

5 % upper

0.30026 infinity

mean( count ) = 1.

SE( count ) = 0.

One-Tailed (upper) CI for count where alpha = 0.05 with 19 df

lower 95 %

-infinity 2.

2.4.3 Using SAS PROC Surveymeans for a SRS

DM ’LOG;CLEAR;OUT;CLEAR’; *** I recommend putting these two lines of code; OPTIONS NODATE NONUMBER; *** at the beginning of every SAS program ;

data SRS_Fig1; wgt= 400/10; * wgt = N/n ; input count @@; datalines; 33 33 30 34 39 27 32 36 35 42 ; proc surveymeans data=SRS_Fig1 total=400 mean clm sum clsum; var count; weight wgt; title1 ’Simple Random Sample -- Example 1’; title2 ’Estimating the population mean and total from the data in Figure 1’; run; ===========================================================================

Simple Random Sample -- Example 1 Estimating the population mean and total from the data in Figure 1

The SURVEYMEANS Procedure

Data Summary

Number of Observations 10 Sum of Weights 400

Statistics

Std Error Variable Mean of Mean 95% CL for Mean


count 34.100000 1.336569 31.0764709 37.

Variable Sum Std Dev 95% CL for Sum

count 13640 534.627596 12430.5884 14849.

2.5 Attribute Proportion Estimation

• Suppose we are interested in an attribute (characteristic) associated with the sampling

units. The population proportion p is the proportion of population units having that

attribute.

• Statistically, the goal is to estimate proportion p.

• Examples: the proportion of females (or males) in an animal population, the proportion of

consumers who own motorcycles, the proportion of married couples with at least 1 child...

• Statistically, we use an indicator function that assigns a yi value to unit i as follows:

yi =

1 if unit i possesses the attribute

0 otherwise

Then t =

∑^ N

i=

yi and yU =

N

∑^ N

i=

yi = p. The population proportion p can be

expressed as a population mean yU. Therefore, we will, under certain conditions, be able

to apply the SRS methods for estimating yU.

• By taking a SRS of size n, we can estimate p with the sample proportion p̂ of units that

possess that attribute: p̂ =

∑n

i=1 yi

n

= y. The sample proportion p̂ is unbiased for p.

• For a finite population of 0 and 1 values, the population variance

S^2 =

N − 1

∑^ N

i=

(yi − p)^2 =

• Therefore, the variance of p̂ is

V (p̂) =

N − n

N

S^2

n

N − n

N

N

N − 1

p(1 − p)

n

• Because S^2 is unknown, we estimate it with s^2 =

n

n − 1

p(1 − ̂p). Substitution provides

the unbiased estimator of V ( p̂):

V̂ (̂p) =

N − n

N

s^2

n

• The square root of V (p̂) in (18) is the standard deviation of the estimator p̂.

• The square root of V̂ (p̂) in (19) is the standard error of p̂.

• The effects of omitting the finite population correction (f.p.c.) from the formulas for large

and small samples apply here as they did earlier.

Figure 3: The Presence/Absence of Longleaf Pine

Rathbun/Cressie data (t = 249 N = 400 p =. 6225 ) 1 1 1 1 1 1 1 0 0 0 1 1 0 1 0 1 1 1 0 1 1 1 1 0 1 0 0 0 1 1 1 1 0 1 1 1 0 1 0 1 1 1 1 1 1 1 0 0 0 1 1 0 1 1 1 1 1 1 1 1 0 1 1 0 0 0 0 0 1 1 1 1 1 0 0 0 1 1 1 0 1 1 0 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 1 1 1 0 1 1 1 0 1 1 1 0 0 1 1 1 0 0 1 1 1 1 1 1 0 1 0 1 0 1 1 0 1 1 1 0 0 0 1 1 1 1 1 1 1 1 0 1 0 0 1 1 1 0 1 1 0 0 1 1 1 1 1 1 1 1 1 1 0 0 1 1 0 1 1 0 0 0 0 0 1 0 1 1 1 1 1 0 1 0 1 0 1 1 1 0 1 1 1 1 1 0 1 0 1 0 1 1 0 0 1 1 1 1 1 1 0 1 1 0 0 1 1 1 0 0 0 1 1 1 0 1 1 1 1 1 0 0 0 1 0 1 1 1 1 1 1 0 0 1 1 1 1 0 1 1 0 1 0 0 1 1 0 1 1 1 1 0 0 1 1 1 1 0 1 1 1 0 0 1 1 0 0 0 1 1 1 1 1 1 0 0 1 0 0 0 0 1 0 1 1 0 1 1 0 1 0 0 1 0 0 1 1 1 1 1 0 0 0 1 1 1 0 0 0 0 0 1 1 0 1 1 1 1 1 1 1 0 0 1 0 1 0 1 0 0 1 1 1 0 0 0 1 1 1 0 0 0 0 0 0 0 0 1 1 1 0 1 0 1 0 1 0 1 1 0 1 0 0 0 0 0 0 0 1 1 0 1 1 0 0 1 0 1 1 1

A simple random sample of size n = 25

R Code and Output for Figure 3 Example

source("c:/courses/st446/rcode/confintt.r")

# t-based confidence intervals for SRS in Figure 3

library(survey)

srsdat <- read.table("c:/courses/st446/rcode/fig3.txt", header=T)

srsdat

srs_design <- svydesign(id=~1, fpc=~fpc, data=srsdat)

estmean <- svymean(~presence,srs_design)

print(estmean,digits=15)

confint.t(estmean,degf(srs_design),level=.90)

confint.t(estmean,degf(srs_design),level=.90,tails=’lower’)

confint.t(estmean,degf(srs_design),level=.90,tails=’upper’)

R output for t-based confidence interval for SRS

> srsdat

presence fpc

mean SE

presence 0.72 0.

mean( presence ) = 0.

SE( presence ) = 0.

Two-Tailed CI for presence where alpha = 0.1 with 24 df

mean( presence ) = 0.

SE( presence ) = 0.

One-Tailed (Lower) CI for presence where alpha = 0.1 with 24 df

10 % upper

0.60305 infinity

mean( presence ) = 0.

SE( presence ) = 0.

One-Tailed (upper) CI for presence where alpha = 0.1 with 24 df

lower 90 %

-infinity 0.

SAS Code and Output for Figure 3 Example

DM ’LOG;CLEAR;OUT;CLEAR’;

OPTIONS NODATE NONUMBER LS=72 PS=54;

DATA SRS_Fig3;

INPUT ind @@;

DATALINES;

DATA SRS_Fig3; set SRS_Fig3;

IF ind = 0 then pa = ’absent ’;

IF ind = 1 then pa = ’present’;

PROC SURVEYMEANS DATA=SRS_Fig3 TOTAL = 400 ALPHA = .10;

VAR pa;

TITLE ’Simple Random Sample -- Figure 3’;

TITLE2 ’Estimating population proportion p’;

RUN;

Simple Random Sample -- Figure 3

Estimating population proportion p

The SURVEYMEANS Procedure

Data Summary

Number of Observations 25

Class Level Information

Class

Variable Levels Values

pa 2 absent present

Statistics

Std Error

Variable Level N Mean of Mean 90% CL for Mean

pa absent 7 0.280000 0.088741 0.12817428 0.

present 18 0.720000 0.088741 0.56817428 0.