



Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
The basic concepts of sampling proportions, including the notation for population and sample, the variances and standard errors of the estimates of the population proportion, the estimation of the standard error from a sample, the calculation of confidence limits, the relative error, and the sample size determination for proportion. It covers important theorems and corollaries related to these topics, as well as examples to illustrate the concepts. A comprehensive overview of the statistical methods and techniques used in estimating population proportions from sample data, which is a fundamental aspect of survey research and data analysis.
Typology: Lecture notes
1 / 5
This page cannot be seen from the preview
Don't miss anything!




Chapter 3: Sampling Proportions
3 .1 Basic Concepts
In some cases the nature of the survey may require recording of the attributes, which can
be expressed qualitatively. The qualitative information can be quantified by counting the
attribute characteristics. These characteristics could be of various forms, such as living in
urban or rural area, being a male or a female, married or unmarried, literate or illiterate,
adults between 18 and 45 years or adults over 45 years, etc.
The main interest for such attributes could be to estimate the total number of units and the
proportion of units in the population possessing some characteristics. Attributes can be
changed into quantifiable information by allocating the score “1” or “0”, while
measurable variables can also be changed into attributes by categorizing the population
into different groups.
It is worth presenting the special simple form that the variance of a proportion takes when
the design is simple random sampling. The following discussion will consider a
population classified into two-category, in which each member of the population is
classified as either having or not having a specified characteristics of interest.
Notation for Population:
N= Total number of units in the population
A= total number of units in specified category C
P = A/N is population proportion in C, i.e. the proportion (Percentage) of the entire
population that has a specified value.
Q = 1-P is proportion of units not in C
Notation for Sample:
n = Total number of members in the sample
a = the total number of members sampled that have the specified attribute,
p = a/n, sample proportion, i.e., the proportion (percentage) of a sample from the
population that has the specified attribute.
q = 1-p, proportion of sample members not in C.
3 .2 Variances and Standard Errors of the Estimates of the Population Proportion
For any unit in the population or in the sample, we define an observation (variable) y i
as
follows to facilitate counting.
y i =
1, if the unit is in C
0, if the Unit is not in C
For population, Y Y A
N
i
i
1
N
i
i
1
, and
2
and for sample y y a
n
i
i
1
, p
n
a
n
y
y
n
i
i
1
2
n
np q
s (verify).
Similar to a continuous case, a sample proportion, p, can be used to make inferences
about a population proportion P. Just like the sample mean y , the sample proportion p is
also a random variable that depends on what members of the population are included in
that sample.
Theorem 5:
The sample proportion p = a/n is an unbiased estimate of the population proportion
Theorem 6:
The variance of the sample proportion or percentage (p) is given by
2
n
N n
. Prove this theorem.
Corollary: i) The estimated total number of units in class C, A Np
, is an unbiased
estimate of A.
ii) The variance of A
= N p, the estimated total number of units in class C, is
Var( A
n
2
N n
3 .3 Estimation of the standard error from a sample
Theorem 7 :
n
pq
N n
n
pq
p
var
If N is large relative to n, the finite population correction (1-f) is negligible and the
n
pq
p. (Prove this theorem)
Corollary : The sample variance of estimated total number of members in specified
category, A Np
, is given by var ( A
n
N N n
pq. In each case we can get the
standard error by taking the square root of the variances.
Example: See Cochran 3
rd
edition page 52
n N
n
n
o
o
, and we can approximate n
by
o
n as we have done for the mean.
Using the relative error () and the relation d = P, we set
2
2
2
2
d
n
o
3.7 Estimates of the population parameters in sample size determination
(continuous and proportion)
In Practice, the population parameters
y
2
Z and , usually set by the investigator (researcher). The relation shows the following
summary points.
The smaller we make , the greater will be the sample size n.
If the degree of confidence (1) increases, then certainty and sample size
increases.
Since population parameters are unknown, calculate n o
by using the sample
estimates. That is,
2 2
n Z pq d
o
or
2
2
( )
2
2
2 2
y
o
Z cv
Z s y
n
How do we get estimates of the population parameters in order to use these estimates in
sample size determination? In actual practice, there are four possible ways of estimating
the parameters.
By taking a simple random sample of size
1
n , small preliminary sample, from
which
2
1
s or
1
p of
2
S or P and the required n will be obtained. This method
gives the most reliable estimates, but slows up the completion of the survey and
because of this it is not often used.
By using the results of pilot survey: To design efficiently a large sample in an
unknown field, a pilot study may be conducted prior to the survey to gain
information for designing the survey which also serves many other purposes.
By using previous surveys results: We should search for data from past surveys of
similar variables and make use of it after adjusting for time changes.
By guesswork about the nature of the population: This requires educated guesses
or the services of experts such as survey statisticians, supported by specialists in
the subject matter concerned who may construct a model of the population
distribution, its shape, and its probable limits, and deduce
2
or P from it.
Reading Assignment: Read Cochran 3
rd
ed., chapter 4, section 4.7, page 78-81.
Examples:
who consider semester system to be more suitable as compared to the 3-term system of
education. A SRS of n =120 teachers is taken from a total N =1200 teachers, without
replacement. Some of the teachers are in favor of two semesters while others are not and
it is found that 72 teachers are in favor of semester system.
i) Estimate the proportion P along with the standard error of your estimate.
ii) Calculate the 95% confidence interval for P
iii) Do you think the sample size 120 is sufficient if the tolerable error could be 0.08? If
not, how many more units should be included in the sample?
Solution: n= 120, a= 72 , N= 1200,
i) P = a/n = 72/120 = 0.
var( )
se p
x
n
n
pq
p
ii) 95% confidence limits:
2
P p z se p x P
Therefore the proportion of teachers in the institutes favoring semester system is likely to
be between 51% and 68%. If the estimate of total number of teachers who are in favor of
two-semester system is required, then it can be computed as:
=N p =720.
0
0
0
2
2
2
2
n
n
n
n x x
d
Z pq
iii n
o
Therefore 120 is not sufficient for achieving the given precision meaning 9 more teachers
need to be selected.