Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Statistics: Simple Random Sampling and Estimation of Population Mean and Proportion, Exams of Design history

Saint Paul University Philipinnes (SPUP)Design history

The concept of simple random sampling and provides formulas for estimating population mean and proportion. It covers topics such as variance estimation, confidence intervals, and sample size determination.

Typology: Exams

2021/2022

Uploaded on 08/01/2022

hal_s95 🇵🇭

4.4

(655)

10K documents

1 / 23

This page cannot be seen from the preview

Don't miss anything!

2.3 Simple Random Sampling

•Simple random sampling without replacement (srswor) of size nis the probability

sampling design for which a fixed number of nunits are selected from a population of N

units without replacement such that every possible sample of nunits has equal probability

of being selected. A resulting sample is called a simple random sample or srs.

•Note: I will use SRS to denote a simple random sample and SR as an abbreviation of

‘simple random’.

•Some necessary combinatorial notation:

–(nfactorial) n! = n×(n−1) ×(n−2) × ··· × 2×1.This is the number of

unique arrangements or orderings (or permutations) of ndistinct items. For example:

6! = 6 ×5×4×3×2×1 = 720.

–(N choose n) N

n=N(N−1) ·· ·(N−n+ 1)

n!=N!

n!(N−n)!.This is the

number of combinations of nitems selected from Ndistinct items (and the order of

selection doesn’t matter). For example, 6

2=6!

2!4! =(6)(5)(4!)

2!4! =(6)(5)

(2)(1) = 15.

•There are N

npossible SRSs of size nselected from a population of size N.

•For any SRS of size nfrom a population of size N, we have P(S) = 1/N

n.

•Unless otherwise specified, we will assume sampling is without replacement.

2.3.1 Estimation of yUand t

•A natural estimator for the population mean yUis the sample mean y. Because yis an

estimate of an individual unit’s y-value, multiplication by the population size Nwill give

us an estimate b

tof the population total t. That is:

c

yU=y=1

n

X

i=1

yib

t=N

n

X

i=1

yi= (10)

•c

yUand b

tare design unbiased. That is, the average values of yand N y taken over all

possible SRSs equal yUand t, respectively.

Demonstration of Unbiasedness: Suppose we have a population consisting of five y-values:

Unit i12345

yi02347

which has the following parameters:

N=t=yU=S2=S≈

Suppose a SRS of size n= 2 is selected. Then P(S) = 1/5

2= 1/10 for each of the 10 possible

SRSs.

22

Discover Exams of Design history Saint Paul University Philipinnes (SPUP)

Partial preview of the text

Download Statistics: Simple Random Sampling and Estimation of Population Mean and Proportion and more Exams Design history in PDF only on Docsity!

2.3 Simple Random Sampling

• Simple random sampling without replacement (srswor) of size n is the probability

sampling design for which a fixed number of n units are selected from a population of N

units without replacement such that every possible sample of n units has equal probability

of being selected. A resulting sample is called a simple random sample or srs.

• Note: I will use SRS to denote a simple random sample and SR as an abbreviation of

‘simple random’.

• Some necessary combinatorial notation:

– (n factorial) n! = n × (n − 1) × (n − 2) × · · · × 2 × 1. This is the number of

unique arrangements or orderings (or permutations) of n distinct items. For example:

6! = 6 × 5 × 4 × 3 × 2 × 1 = 720.

– (N choose n)

N

n

N (N − 1) · · · (N − n + 1)

n!

N!

n!(N − n)!

. This is the

number of combinations of n items selected from N distinct items (and the order of

selection doesn’t matter). For example,

• There are

(N

n

possible SRSs of size n selected from a population of size N.

• For any SRS of size n from a population of size N , we have P (S) = 1/

(N

n

• Unless otherwise specified, we will assume sampling is without replacement.

2.3.1 Estimation of yU and t

• A natural estimator for the population mean yU is the sample mean y. Because y is an

estimate of an individual unit’s y-value, multiplication by the population size N will give

us an estimate ̂t of the population total t. That is:

ŷU = y = 1

n

∑^ n

i=

yi ̂t =

N

n

∑^ n

i=

yi = (10)

• ̂yU and ̂t are design unbiased. That is, the average values of y and N y taken over all

possible SRSs equal yU and t, respectively.

Demonstration of Unbiasedness: Suppose we have a population consisting of five y-values:

Unit i 1 2 3 4 5

yi 0 2 3 4 7

which has the following parameters:

N = t = yU = S^2 = S ≈

Suppose a SRS of size n = 2 is selected. Then P (S) = 1/

2

= 1/10 for each of the 10 possible

SRSs.

All Possible Samples and Statistics from Example Population

Sample Units y-values

yi ̂yU = y ̂t = N y Ŝ^2 = s^2 Ŝ = s

S 1 1,2 0,2 2 1 5 2 1.

S 2 1,3 0,3 3 1.5 7.5 4.5 2.

S 3 1,4 0,4 4 2 10 8 2.

S 4 1,5 0,7 7 3.5 17.5 24.5 4.

S 5 2,3 2,3 5 2.5 12.5 .5 0.

S 6 2,4 2,4 6 3 15 2 1.

S 7 2,5 2,7 9 4.5 22.5 12.5 3.

S 8 3,4 3,4 7 3.5 17.5 .5 0.

S 9 3,5 3,7 10 5 25 8 2.

S 10 4,5 4,7 11 5.5 27.5 4.5 2.

Column Sum 32 160 67 22.

Expected value 3210 = 3. 2 16010 = 16 6710 = 6. 7 22. 106274 = 2. 26274

= E(estimator) = yU = t = S^2 6 = S

The averages for estimators ŷU = y, ̂t = N y, and Ŝ^2 = s^2 equal the parameters that they

are estimating. This implies that y, N y, and s^2 are unbiased estimators of yU , t, and S^2.

Notation: E(ŷ U ) = yU , E(̂t) = t, E(Ŝ^2 ) = S^2 or E(y) = yU , E(N y) = t, E(s^2 ) = S^2.

The average for estimator Ŝ = s does not equal the parameter S. This implies that s is a

biased estimator of S. Notation: E(Ŝ ) 6 = S or E(s) 6 = S.

The next problem is to study the variances of ŷU = y and ̂t = N y.
Warning: In an introductory statistics course, you were told that the variance of the sample

mean V (Y ) = S^2 /n (= σ^2 /n) and its standard deviation is S/

n (= σ/

n). This is

appropriate if a sample was to be taken from an infinite or extremely large population.

However, we are dealing with finite populations that often are not considered extremely

large. In such cases, we have to adjust our variance formulas by

N − n

N

which is known

as the finite population correction (f.p.c.).

Texts may rewrite the f.p.c.

N − n

N

as either 1 −

n

N

or 1 − f where f = n/N is the

fraction of the population that was sampled. By definition :

V ( ŷ U ) = V (y) = V (̂t) = N 2 V (y) = N (N − n)

S^2

n

Because S^2 is unknown, we use s^2 to get unbiased estimators of the variances in (11)::

V̂ (̂yU ) = V̂ (y) = V̂ (̂t) = N 2 V̂ (y) = N (N − n)s

2

n

Taking a square root of a variance in (11) yields the standard deviation of the estimator.
Taking a square root of an estimated variance in (12) yields the standard error of the

estimate.

2.3.2 SRS With Replacement

• Consider a sampling procedure in which a sampling unit is randomly selected from the

population, its y-value recorded, and is then returned to the population. This process of

randomly selecting units with replacement after each stage is repeated n times. Thus, a

sampling unit may be sampled multiple times. A sample of n units selected by such a

procedure is called a simple random sample with replacement.

• The estimators for SRS with replacement are: ŷU = y V̂ (ŷU ) = V̂ (̂ y) =

s^2

n

• Suppose we have two estimators ̂θ 1 and ̂θ 2 of some parameter θ.

̂ θ 1 is less efficient than θ̂ 2 for estimating θ if V (θ̂ 1 ) > V ( θ̂ 2 ).

̂ θ 1 is more efficient than θ̂ 2 for estimating θ if V ( θ̂ 1 ) < V ( θ̂ 2 ).

• For most situations, the estimator for a SRS with replacement is less efficient than the

estimator for a SRS without replacement.

• There will be circumstances (such as sampling proportional to size) where we will consider

sampling with replacement. Unless otherwise stated, we assume that sampling is done

without replacement.

2.4 Two-Sided Confidence Intervals for yU and t

• In an introductory statistics course, you were given confidence interval formulas

y ± z∗^

s

n

and y ± t∗^

s

n

These formulas are applicable if a sample was to be taken from an infinitely or extremely

large population. But when we are dealing with finite populations, we adjust our variance

formulas by the finite population correction.

• In the finite population version of the Central Limit Theorem, we assume the estimators

̂ yU = y and ̂t = N y have sampling distributions that are approximately normal. That is,

ŷ U ∼˙ N

yU ,

N − n

N

S^2

n

and ̂t ∼˙ N

t , N (N − n)

S^2

n

• For large samples, approximate 100(1 − α)% confidence intervals for yU (μ) and t (τ ) are

For yU : For t : (14)

y ± z∗

N − n

N

s^2

n

N y ± z∗

N (N − n)

s^2

n

y ± z∗s

N − n

N

/n N y ± z∗s

N (N − n)/n (15)

where z∗^ is the upper α/2 critical value from the standard normal distribution. Or, in

standard error (s.e.) notation,

ŷ U ± ̂t ±

For 90%, 95%, and 99%, z∗^ = 1. 645 , 1. 96 , and 2.576, respectively.

For smaller samples, approximate 100(1 − α)% confidence intervals for yU and t are

For yU : For t : (16)

y ± t∗

N − n

N

s^2

n

N y ± t∗

N (N − n)

s^2

n

y ± t∗s

N − n

N

/n N y ± t∗s

N (N − n)/n (17)

where t∗^ is the upper α/2 critical value from the t(n − 1) distribution.

The quantity being added and subtracted from ̂yU = y or ̂t = N y in the confidence

interval is known as the margin of error.

Example: Use the small population data again. For n = 2, t∗^ ≈ 6 .314 for a nominal 90%

confidence level.

All Possible Samples and Confidence Intervals from Example Population Sample y-values ∑ yi ŷU = y ̂t = N y Ŝ^2 = s^2 Ŝ = s V̂ (ŷU ) V̂ (̂ t) 90% ci for t 1 0,2 2 1 5 2 1.4142 0.6 15 (-19.45, 29.45) 2 0,3 3 1.5 7.5 4.5 2.1213 1.35 33.75 (-29.18, 44.18) 3 0,4 4 2 10 8 2.8284 2.4 60 (-38.91, 58.91) 4 0,7 7 3.5 17.5 24.5 4.9497 7.35 183.75 (-68.09, 103.09) 5 2,3 5 2.5 12.5 .5 0.7071 0.15 3.75 (0.27, 24.73) 6 2,4 6 3 15 2 1.4142 0.6 15 (-9.45, 39.45) 7 2,7 9 4.5 22.5 12.5 3.5355 3.75 93.75 (-38.63, 83.63) 8 3,4 7 3.5 17.5 .5 0.7071 0.15 3.75 (5.27, 29.73) 9 3,7 10 5 25 8 2.8284 2.4 60 (-23.91, 73.91) 10 4,7 11 5.5 27.5 4.5 2.1213 1.35 33.75 (-9.18, 64.18)

2.4.1 One-Sided Confidence Intervals for yU and t

Occasionally, a researcher may want a one-sided confidence interval. There are two types

of one-sided confidence intervals: upper and lower.

Approximate upper and lower 100(1 − α)% confidence intervals for yU and t are:

For yU : For t :

y − t∗s

N − n

N

/n , ∞

N y − t∗s

N (N − n)/n , ∞

upper

−∞ , y + t∗s

N − n

N

/n

−∞ , N y + t∗s

N (N − n)/n

lower

where t∗^ is the upper α critical value from the t(n − 1) distribution.

If the y-values cannot be negative, replace −∞ with 0 in the lower confidence interval

formulas. If the y-values cannot be positive, replace ∞ with 0 in the upper confidence

interval formulas.

SRS taken from Figure 1 (n = 10, t = 13354, yU = 33. 385 , y = 34. 1 , s^2 = 18.32)

SRS Example using Rathbun and Cressie (1994) Data

To illustrate the application of simple random sampling to population total t estimation, consider the abundance data in Figure 2. The abundance counts correspond to the census data studied by Rathbun and Cressie (1994).
This 200 × 200 m study region is located in an old-growth forest in Thomas County, Georgia. This data represents the number of longleaf pine trees located in each quadrat. The coordinates of the 584 tree locations are given in Cressie (1991).
I have gridded the region into a 20 × 20 grid of 10 × 10 m quadrats. The total abundance t = 584 and the mean abundance per quadrat yU = 584/400 = 1.435. The population variance S^2 = 3.853.
There is only a weak spatial correlation of tree counts within the study region.
The pineleaf census data will be used to compare estimation properties of various sampling designs.
Note the two relatively large boldfaced values ( 14 and 16 ).

Figure 2

Longleaf Pine Data (Rathbun and Cressie 1994)

1 1 1 1 1 2 1 0 0 0 4 5 0 1 0 1 2 1 0 1 3 2 1 0 1 0 0 0 1 2 2 2 0 2 2 2 0 2 0 1 7 4 1 1 1 1 0 0 0 2 2 0 4 3 2 4 2 1 2 2 0 1 2 0 0 0 0 0 4 6 5 1 5 0 0 0 2 1 2 0 1 1 0 2 3 2 0 0 2 1 3 1 4 1 1 1 2 2 1 1 2 0 0 0 4 3 3 0 1 16 5 0 1 3 8 0 0 1 3 3 0 0 1 14 3 3 1 2 0 8 0 2 0 3 9 0 4 2 1 0 0 0 5 1 8 7 6 6 6 1 0 4 0 0 1 2 2 0 1 2 0 0 2 2 3 2 2 3 1 1 1 3 0 0 2 2 0 3 4 0 0 0 0 0 1 0 3 1 1 1 2 0 2 0 2 0 2 1 1 0 1 8 7 7 8 0 5 0 1 0 1 2 0 0 2 4 2 2 2 4 0 9 1 0 0 1 1 1 0 0 0 1 2 4 0 2 1 3 3 1 0 0 0 1 0 2 4 3 1 2 2 0 0 1 1 2 2 0 2 4 0 1 0 0 1 2 0 2 3 5 2 0 0 2 1 1 2 0 1 3 1 0 0 1 1 0 0 0 2 2 2 1 1 1 0 0 2 0 0 0 0 2 0 2 2 0 1 1 0 2 0 0 1 0 0 1 1 1 5 3 0 0 0 3 2 1 0 0 0 0 0 2 1 0 1 1 1 3 1 2 1 0 0 1 0 3 0 1 0 0 2 1 2 0 0 0 1 1 1 0 0 0 0 0 0 0 0 1 1 1 0 1 0 3 0 2 0 1 1 0 2 0 0 0 0 0 0 0 1 2 0 1 3 0 0 1 0 1 2 4

REFERENCES (for Figure 2 data)

Cressie, Noel (1991) Statistics for Spatial Data. Wiley, New York.

Rathbun, S.L. and Cressie, N. (1994) A space-time survival point process for a longleaf pine forest in southern Georgia. Journal of the American Statistical Association, 89 , 1164-1174.

2.4.2 Using the R Survey Package for a SRS

R Code and Output for Figure 1 SRS Analysis

"count" "fpc" <- This is the contents of the data file fig1.txt

33 400 <- The first column are the recorded responses

33 400 <- The second column is the population size N

R Code

source("c:/courses/st446/rcode/confintt.r")

# t-based confidence intervals for SRS in Figure 1

library(survey)

srsdat <- read.table("c:/courses/st446/rcode/fig1.txt", header=T)

srsdat

srs_design <- svydesign(id=~1, fpc=~fpc, data=srsdat)

srs_design

esttotal <- svytotal(~count,srs_design)

print(esttotal,digits=15)

confint.t(esttotal,degf(srs_design),level=.95)

confint.t(esttotal,degf(srs_design),level=.95,tails=’lower’)

confint.t(esttotal,degf(srs_design),level=.95,tails=’upper’)

estmean <- svymean(~count,srs_design)

print(estmean,digits=15)

confint.t(estmean,degf(srs_design),level=.95)

confint.t(estmean,degf(srs_design),level=.95,tails=’lower’)

confint.t(estmean,degf(srs_design),level=.95,tails=’upper’)

R output for t-based confidence interval for SRS

> srsdat

count fpc

Independent Sampling design

total SE

count 13640 534.

mean( count ) = 13640.

SE( count ) = 534.

Two-Tailed CI for count where alpha = 0.05 with 9 df

mean( count ) = 13640.

SE( count ) = 534.

One-Tailed (Lower) CI for count where alpha = 0.05 with 9 df

5 % upper

12659.96724 infinity

mean( count ) = 13640.

SE( count ) = 534.

One-Tailed (upper) CI for count where alpha = 0.05 with 9 df

lower 95 %

-infinity 14620.

mean SE

count 34.1 1.

mean( count ) = 34.

SE( count ) = 1.

Two-Tailed CI for count where alpha = 0.05 with 9 df

mean( count ) = 34.

SE( count ) = 1.

One-Tailed (Lower) CI for count where alpha = 0.05 with 9 df

5 % upper

31.64992 infinity

mean( count ) = 34.

SE( count ) = 1.

One-Tailed (upper) CI for count where alpha = 0.05 with 9 df

lower 95 %

-infinity 36.

mean( count ) = 620.

SE( count ) = 289.

Two-Tailed CI for count where alpha = 0.05 with 19 df

mean( count ) = 620.

SE( count ) = 289.

One-Tailed (Lower) CI for count where alpha = 0.05 with 19 df

5 % upper

120.10415 infinity

mean( count ) = 620.

SE( count ) = 289.

One-Tailed (upper) CI for count where alpha = 0.05 with 19 df

lower 95 %

-infinity 1119.

mean SE

count 1.55 0.

mean( count ) = 1.

SE( count ) = 0.

Two-Tailed CI for count where alpha = 0.05 with 19 df

mean( count ) = 1.

SE( count ) = 0.

One-Tailed (Lower) CI for count where alpha = 0.05 with 19 df

5 % upper

0.30026 infinity

mean( count ) = 1.

SE( count ) = 0.

One-Tailed (upper) CI for count where alpha = 0.05 with 19 df

lower 95 %

-infinity 2.

2.4.3 Using SAS PROC Surveymeans for a SRS

DM ’LOG;CLEAR;OUT;CLEAR’; *** I recommend putting these two lines of code; OPTIONS NODATE NONUMBER; *** at the beginning of every SAS program ;

data SRS_Fig1; wgt= 400/10; * wgt = N/n ; input count @@; datalines; 33 33 30 34 39 27 32 36 35 42 ; proc surveymeans data=SRS_Fig1 total=400 mean clm sum clsum; var count; weight wgt; title1 ’Simple Random Sample -- Example 1’; title2 ’Estimating the population mean and total from the data in Figure 1’; run; ===========================================================================

Simple Random Sample -- Example 1 Estimating the population mean and total from the data in Figure 1

The SURVEYMEANS Procedure

Data Summary

Number of Observations 10 Sum of Weights 400

Statistics

Std Error Variable Mean of Mean 95% CL for Mean

count 34.100000 1.336569 31.0764709 37.

Variable Sum Std Dev 95% CL for Sum

count 13640 534.627596 12430.5884 14849.

2.5 Attribute Proportion Estimation

• Suppose we are interested in an attribute (characteristic) associated with the sampling

units. The population proportion p is the proportion of population units having that

attribute.

• Statistically, the goal is to estimate proportion p.

• Examples: the proportion of females (or males) in an animal population, the proportion of

consumers who own motorcycles, the proportion of married couples with at least 1 child...

• Statistically, we use an indicator function that assigns a yi value to unit i as follows:

yi =

1 if unit i possesses the attribute

0 otherwise

Then t =

∑^ N

i=

yi and yU =

N

∑^ N

i=

yi = p. The population proportion p can be

expressed as a population mean yU. Therefore, we will, under certain conditions, be able

to apply the SRS methods for estimating yU.

• By taking a SRS of size n, we can estimate p with the sample proportion p̂ of units that

possess that attribute: p̂ =

∑n

i=1 yi

n

= y. The sample proportion p̂ is unbiased for p.

• For a finite population of 0 and 1 values, the population variance

S^2 =

N − 1

∑^ N

i=

(yi − p)^2 =

• Therefore, the variance of p̂ is

V (p̂) =

N − n

N

S^2

n

N − n

N

N − 1

p(1 − p)

n

• Because S^2 is unknown, we estimate it with s^2 =

n

n − 1

p(1 − ̂p). Substitution provides

the unbiased estimator of V ( p̂):

V̂ (̂p) =

N − n

N

s^2

n

• The square root of V (p̂) in (18) is the standard deviation of the estimator p̂.

• The square root of V̂ (p̂) in (19) is the standard error of p̂.

• The effects of omitting the finite population correction (f.p.c.) from the formulas for large

and small samples apply here as they did earlier.

Figure 3: The Presence/Absence of Longleaf Pine

Rathbun/Cressie data (t = 249 N = 400 p =. 6225 ) 1 1 1 1 1 1 1 0 0 0 1 1 0 1 0 1 1 1 0 1 1 1 1 0 1 0 0 0 1 1 1 1 0 1 1 1 0 1 0 1 1 1 1 1 1 1 0 0 0 1 1 0 1 1 1 1 1 1 1 1 0 1 1 0 0 0 0 0 1 1 1 1 1 0 0 0 1 1 1 0 1 1 0 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 1 1 1 0 1 1 1 0 1 1 1 0 0 1 1 1 0 0 1 1 1 1 1 1 0 1 0 1 0 1 1 0 1 1 1 0 0 0 1 1 1 1 1 1 1 1 0 1 0 0 1 1 1 0 1 1 0 0 1 1 1 1 1 1 1 1 1 1 0 0 1 1 0 1 1 0 0 0 0 0 1 0 1 1 1 1 1 0 1 0 1 0 1 1 1 0 1 1 1 1 1 0 1 0 1 0 1 1 0 0 1 1 1 1 1 1 0 1 1 0 0 1 1 1 0 0 0 1 1 1 0 1 1 1 1 1 0 0 0 1 0 1 1 1 1 1 1 0 0 1 1 1 1 0 1 1 0 1 0 0 1 1 0 1 1 1 1 0 0 1 1 1 1 0 1 1 1 0 0 1 1 0 0 0 1 1 1 1 1 1 0 0 1 0 0 0 0 1 0 1 1 0 1 1 0 1 0 0 1 0 0 1 1 1 1 1 0 0 0 1 1 1 0 0 0 0 0 1 1 0 1 1 1 1 1 1 1 0 0 1 0 1 0 1 0 0 1 1 1 0 0 0 1 1 1 0 0 0 0 0 0 0 0 1 1 1 0 1 0 1 0 1 0 1 1 0 1 0 0 0 0 0 0 0 1 1 0 1 1 0 0 1 0 1 1 1

Statistics: Simple Random Sampling and Estimation of Population Mean and Proportion, Exams of Design history

Related documents

Partial preview of the text

Download Statistics: Simple Random Sampling and Estimation of Population Mean and Proportion and more Exams Design history in PDF only on Docsity!

2.3 Simple Random Sampling

• Simple random sampling without replacement (srswor) of size n is the probability

sampling design for which a fixed number of n units are selected from a population of N

units without replacement such that every possible sample of n units has equal probability

of being selected. A resulting sample is called a simple random sample or srs.

• Note: I will use SRS to denote a simple random sample and SR as an abbreviation of

‘simple random’.

• Some necessary combinatorial notation:

– (n factorial) n! = n × (n − 1) × (n − 2) × · · · × 2 × 1. This is the number of

unique arrangements or orderings (or permutations) of n distinct items. For example:

6! = 6 × 5 × 4 × 3 × 2 × 1 = 720.

– (N choose n)

N

n

N (N − 1) · · · (N − n + 1)

n!

N!

n!(N − n)!

. This is the

number of combinations of n items selected from N distinct items (and the order of

selection doesn’t matter). For example,

• There are

(N

possible SRSs of size n selected from a population of size N.

• For any SRS of size n from a population of size N , we have P (S) = 1/

(N

• Unless otherwise specified, we will assume sampling is without replacement.

2.3.1 Estimation of yU and t

• A natural estimator for the population mean yU is the sample mean y. Because y is an

estimate of an individual unit’s y-value, multiplication by the population size N will give

us an estimate ̂t of the population total t. That is:

ŷU = y = 1

n

∑^ n

yi ̂t =

N

n

∑^ n

yi = (10)

• ̂yU and ̂t are design unbiased. That is, the average values of y and N y taken over all

possible SRSs equal yU and t, respectively.

Demonstration of Unbiasedness: Suppose we have a population consisting of five y-values:

Unit i 1 2 3 4 5

yi 0 2 3 4 7

which has the following parameters:

N = t = yU = S^2 = S ≈

Suppose a SRS of size n = 2 is selected. Then P (S) = 1/

= 1/10 for each of the 10 possible

SRSs.

All Possible Samples and Statistics from Example Population

Sample Units y-values

yi ̂yU = y ̂t = N y Ŝ^2 = s^2 Ŝ = s

S 1 1,2 0,2 2 1 5 2 1.

S 2 1,3 0,3 3 1.5 7.5 4.5 2.

S 3 1,4 0,4 4 2 10 8 2.

S 4 1,5 0,7 7 3.5 17.5 24.5 4.

S 5 2,3 2,3 5 2.5 12.5 .5 0.

S 6 2,4 2,4 6 3 15 2 1.

S 7 2,5 2,7 9 4.5 22.5 12.5 3.

S 8 3,4 3,4 7 3.5 17.5 .5 0.

S 9 3,5 3,7 10 5 25 8 2.

S 10 4,5 4,7 11 5.5 27.5 4.5 2.

Column Sum 32 160 67 22.

Expected value 3210 = 3. 2 16010 = 16 6710 = 6. 7 22. 106274 = 2. 26274

= E(estimator) = yU = t = S^2 6 = S

The averages for estimators ŷU = y, ̂t = N y, and Ŝ^2 = s^2 equal the parameters that they

are estimating. This implies that y, N y, and s^2 are unbiased estimators of yU , t, and S^2.

Notation: E(ŷ U ) = yU , E(̂t) = t, E(Ŝ^2 ) = S^2 or E(y) = yU , E(N y) = t, E(s^2 ) = S^2.

The average for estimator Ŝ = s does not equal the parameter S. This implies that s is a

biased estimator of S. Notation: E(Ŝ ) 6 = S or E(s) 6 = S.

mean V (Y ) = S^2 /n (= σ^2 /n) and its standard deviation is S/

n (= σ/

n). This is

appropriate if a sample was to be taken from an infinite or extremely large population.

large. In such cases, we have to adjust our variance formulas by

N − n