Sampling Proportions, Lecture notes of Statistics

The basic concepts of sampling proportions, including the notation for population and sample, the variances and standard errors of the estimates of the population proportion, the estimation of the standard error from a sample, the calculation of confidence limits, the relative error, and the sample size determination for proportion. It covers important theorems and corollaries related to these topics, as well as examples to illustrate the concepts. A comprehensive overview of the statistical methods and techniques used in estimating population proportions from sample data, which is a fundamental aspect of survey research and data analysis.

Typology: Lecture notes

2023/2024

Uploaded on 05/24/2024

milkiyas-aboma-1
milkiyas-aboma-1 🇪🇹

2 documents

1 / 5

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1
Chapter 3: Sampling Proportions
3.1 Basic Concepts
In some cases the nature of the survey may require recording of the attributes, which can
be expressed qualitatively. The qualitative information can be quantified by counting the
attribute characteristics. These characteristics could be of various forms, such as living in
urban or rural area, being a male or a female, married or unmarried, literate or illiterate,
adults between 18 and 45 years or adults over 45 years, etc.
The main interest for such attributes could be to estimate the total number of units and the
proportion of units in the population possessing some characteristics. Attributes can be
changed into quantifiable information by allocating the score “1” or “0”, while
measurable variables can also be changed into attributes by categorizing the population
into different groups.
It is worth presenting the special simple form that the variance of a proportion takes when
the design is simple random sampling. The following discussion will consider a
population classified into two-category, in which each member of the population is
classified as either having or not having a specified characteristics of interest.
Notation for Population:
N= Total number of units in the population
A= total number of units in specified category C
P = A/N is population proportion in C, i.e. the proportion (Percentage) of the entire
population that has a specified value.
Q = 1-P is proportion of units not in C
Notation for Sample:
n = Total number of members in the sample
a = the total number of members sampled that have the specified attribute,
p = a/n, sample proportion, i.e., the proportion (percentage) of a sample from the
population that has the specified attribute.
q = 1-p, proportion of sample members not in C.
3.2 Variances and Standard Errors of the Estimates of the Population Proportion
For any unit in the population or in the sample, we define an observation (variable) yi as
follows to facilitate counting.
yi = 1, if the unit is in C
0, if the Unit is not in C
For population, AYY N
i
i
1
, P
N
A
N
Y
Y
N
ii
1, and
1
2
N
NPQ
S,
pf3
pf4
pf5

Partial preview of the text

Download Sampling Proportions and more Lecture notes Statistics in PDF only on Docsity!

Chapter 3: Sampling Proportions

3 .1 Basic Concepts

In some cases the nature of the survey may require recording of the attributes, which can

be expressed qualitatively. The qualitative information can be quantified by counting the

attribute characteristics. These characteristics could be of various forms, such as living in

urban or rural area, being a male or a female, married or unmarried, literate or illiterate,

adults between 18 and 45 years or adults over 45 years, etc.

The main interest for such attributes could be to estimate the total number of units and the

proportion of units in the population possessing some characteristics. Attributes can be

changed into quantifiable information by allocating the score “1” or “0”, while

measurable variables can also be changed into attributes by categorizing the population

into different groups.

It is worth presenting the special simple form that the variance of a proportion takes when

the design is simple random sampling. The following discussion will consider a

population classified into two-category, in which each member of the population is

classified as either having or not having a specified characteristics of interest.

Notation for Population:

N= Total number of units in the population

A= total number of units in specified category C

P = A/N is population proportion in C, i.e. the proportion (Percentage) of the entire

population that has a specified value.

Q = 1-P is proportion of units not in C

Notation for Sample:

n = Total number of members in the sample

a = the total number of members sampled that have the specified attribute,

p = a/n, sample proportion, i.e., the proportion (percentage) of a sample from the

population that has the specified attribute.

q = 1-p, proportion of sample members not in C.

3 .2 Variances and Standard Errors of the Estimates of the Population Proportion

For any unit in the population or in the sample, we define an observation (variable) y i

as

follows to facilitate counting.

y i =

1, if the unit is in C

0, if the Unit is not in C

For population, Y Y A

N

i

i

 1

, P
N
A
N
Y
Y

N

i

i

 1

, and

2

N
NPQ
S ,

and for sample y y a

n

i

i

 1

, p

n

a

n

y

y

n

i

i

 1

2

n

np q

s (verify).

Similar to a continuous case, a sample proportion, p, can be used to make inferences

about a population proportion P. Just like the sample mean y , the sample proportion p is

also a random variable that depends on what members of the population are included in

that sample.

Theorem 5:

The sample proportion p = a/n is an unbiased estimate of the population proportion

P = A/N, i.e., E  p   P. Prove this theorem.

Theorem 6:

The variance of the sample proportion or percentage (p) is given by

Var  p  = E(p-P)

2

n

PQ
N 1

N n

. Prove this theorem.

Corollary: i) The estimated total number of units in class C, ANp

, is an unbiased

estimate of A.

ii) The variance of A

= N p, the estimated total number of units in class C, is

Var( A

n

N PQ

2

N 1

N n

3 .3 Estimation of the standard error from a sample

Theorem 7 :

An unbiased estimate the sample variance will be    f 

n

pq

N

N n

n

pq

p

var

If N is large relative to n, the finite population correction (1-f) is negligible and the

variance of p is var  

n

pq

p. (Prove this theorem)

Corollary : The sample variance of estimated total number of members in specified

category, ANp

, is given by var ( A

n

N N n

pq. In each case we can get the

standard error by taking the square root of the variances.

Example: See Cochran 3

rd

edition page 52

population size  N we have the sample size

n N

n

n

o

o

, and we can approximate n

by

o

n as we have done for the mean.

Using the relative error () and the relation d =  P, we set

2

2

2

2

P 
Z Q

d

Z PQ

n

o

3.7 Estimates of the population parameters in sample size determination

(continuous and proportion)

In Practice, the population parameters

S P

y

2

 must be estimated and the other factors

Z and , usually set by the investigator (researcher). The relation shows the following

summary points.

 The smaller we make , the greater will be the sample size n.

 If the degree of confidence (1) increases, then certainty and sample size

increases.

 Since population parameters are unknown, calculate n o

by using the sample

estimates. That is,

2 2

n Z pq d

o

 or

2

2

( )

2

2

2 2

y

o

Z cv

Z s y

n  

How do we get estimates of the population parameters in order to use these estimates in

sample size determination? In actual practice, there are four possible ways of estimating

the parameters.

 By taking a simple random sample of size

1

n , small preliminary sample, from

which

2

1

s or

1

p of

2

S or P and the required n will be obtained. This method

gives the most reliable estimates, but slows up the completion of the survey and

because of this it is not often used.

 By using the results of pilot survey: To design efficiently a large sample in an

unknown field, a pilot study may be conducted prior to the survey to gain

information for designing the survey which also serves many other purposes.

 By using previous surveys results: We should search for data from past surveys of

similar variables and make use of it after adjusting for time changes.

 By guesswork about the nature of the population: This requires educated guesses

or the services of experts such as survey statisticians, supported by specialists in

the subject matter concerned who may construct a model of the population

distribution, its shape, and its probable limits, and deduce

2

S

or P from it.

Reading Assignment: Read Cochran 3

rd

ed., chapter 4, section 4.7, page 78-81.

Examples:

  1. A teacher training institutes are interested in estimating the proportion (P) of teachers

who consider semester system to be more suitable as compared to the 3-term system of

education. A SRS of n =120 teachers is taken from a total N =1200 teachers, without

replacement. Some of the teachers are in favor of two semesters while others are not and

it is found that 72 teachers are in favor of semester system.

i) Estimate the proportion P along with the standard error of your estimate.

ii) Calculate the 95% confidence interval for P

iii) Do you think the sample size 120 is sufficient if the tolerable error could be 0.08? If

not, how many more units should be included in the sample?

Solution: n= 120, a= 72 , N= 1200,

i) P = a/n = 72/120 = 0.

var( )    

se p

x

N

n

n

pq

p

ii) 95% confidence limits:

2

Ppz se p   x    P

Therefore the proportion of teachers in the institutes favoring semester system is likely to

be between 51% and 68%. If the estimate of total number of teachers who are in favor of

two-semester system is required, then it can be computed as:

A

=N p =720.

0

0

0

2

2

2

2

N

n

n

n

N

n x x

d

Z pq

iii n

o

Therefore 120 is not sufficient for achieving the given precision meaning 9 more teachers

need to be selected.