





















Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
An excerpt from the International Program for Development Evaluation Training (2007). It discusses various sampling concepts, including populations, samples, sampling frames, sample designs, parameters, and statistics. The document also explains the differences between random and non-random samples and lists and defines the types of each. It emphasizes the importance of statistics in determining confidence and precision and provides guidelines for determining sample size.
Typology: Lecture notes
1 / 29
This page cannot be seen from the preview
Don't miss anything!






















Module 9
Sampling
Introduction
In the previous three modules, you have learned about writing evaluation questions, and choosing an evaluation design and data collection approach. In this module, you will look at sampling strategies to help you determine how much data you need to collect, and where to collect it so that you can answer your evaluation questions. This module has five topics. They are:
Intro Glossary Types Confidence/ Precision? How Large?
Intervention or Policy
Data Design Collection Evaluation Questions Approaches
Module 9
Page 442 International Program for Development Evaluation Training − 2007
By the end of the module, you should be able to:
Module 9
Page 444 International Program for Development Evaluation Training − 2007
Introduction to Sampling
When we begin planning our data collection strategy, we have to decide whether it is possible to collect data from the entire population we intend to study: every document, for example, or every farmer, or every mile of a road system. Can we review every file, observe every farmer, examine every road? If we can, we can then accurately report the qualifications of every teacher in our school system, the number of paved miles on all our roads, or the views of all the citizens of the country. If we collect all the data accurately and reliably, then there is little chance of error. The complete coverage of the population in question is called a census. However, most often we are unable to collect data from every file, farmer, or person. It takes too much time and costs too much. Instead, we take a sample − a subset of the entire population. If we select a sample, we may be able to draw inferences about a population based on our sample results; that is, we can estimate what the population is like based on our sample results. We call this “ generalizing to a population .” We use samples all the time. For example, when we have a blood test to check on our health, the laboratory takes a sample rather than all our blood. Tests are run using that sample and it is assumed that what they find in the sample is an accurate reflection of what is in all our blood. Sampling is not just something that applies to large, quantitative studies. Even when conducting a highly qualitative, one-week field visit to assess a program that is spread out over a large geographic region, for example, you still need to be thoughtful about which areas of that region to investigate. Consider, as another example, the biases that might be introduced if program officials select the participants to be studied and how those biases could be avoided with a randomly selected sample. A modest understanding of the basic concepts of systematic sampling can greatly enhance the extent to which your assessment reflects what is really going on in the real world.
Sampling
International Program for Development Evaluation Training − 2007 Page 445
Sampling Glossary
population: The total set of units. It could be all the citizens in a country, all farms in a region, or all children under the age of five living without running water in a particular area. census: A count of (or collection of information from) the entire population. sample : A subset of units selected from a larger set of the same units. sampling frame : The list from which you can select your sample. systematic sampling : a sample drawn from a list using a random start followed by a fixed sampling interval. sample design : The method of sample selection. random sample: A sample in which each unit in the population has an equal chance of being selected. quota sample: A sample in which a specific number of different types of units are selected. accidental sample: A sample in which the units are selected “by accident.” snowball sample: A type of sampling strategy typically used in interviews, where you ask interviewees who else you should talk to. judgmental sample : A sample in which selections are made based on pre-determined criteria. convenience sample: A sample in which selections are based on the convenience to the evaluator (e.g., on easy geographic or organizational access). parameter: Characteristic of the population. statistic: Characteristic of a sample. confidence level: How certain you are (or need to be) that the statistic obtained from your sample is an accurate estimate of the population as a whole. confidence interval: The calculated range within which the true population value lies and for which we can express 95% confidence (the standard, but may vary).
Sampling
International Program for Development Evaluation Training − 2007 Page 447
You want to review records but the records are in boxes and there is no way you can go through and number them all to select a sample. A systematic selection with a random start is acceptable, as long as there is nothing about the original order of the documents that is systematic. For example, you would take a random start and then pick every 20th file until you get the total number of files you want to analyze.
Types of Random Samples There are four types of random samples. They are:
A simple random sample is the simplest sample. We establish a sample size and then proceed to randomly select units until we reach that number. Let us say we want to select 100 files from a population of 500. All the files have been consecutively numbered from 001 to 500 and are filed in numerical order from 1-500. We could then use a random numbers table, mentally block it off into three-digit numbers, and then select the first 100 numbers that fall between 001 and 500. These are the files we select for our study.
Sometimes we want to make sure specific groups are included that might otherwise be missed by using a simple random sample; those groups are usually a small proportion of the population. In this case, we would divide the population into strata based on some meaningful characteristic. This kind of sample is called a stratified random sample. For example, you may want to make sure you have enough people from rural areas in your study. If selected by a simple random sample, you may not get enough people from rural areas if they are a small proportion of all the people in the area. This is especially important if you want to have sufficient numbers in each stratum so you can make meaningful comparisons. For example, we may want to take a stratified sample of farmers at various distances from a major city.
Module 9
Page 448 International Program for Development Evaluation Training − 2007
To do a stratified random sample, divide the population into non-overlapping groups (i.e., strata ) n 1 , n 2 , n 3 , ... n (^) i, such that n 1 + n 2 + n 3 + ... + n (^) i = n. Then do a simple random sample in each stratum. Figure 9.1 illustrates this process.
Fig. 9.1: Stratified Random Sample.
Cluster sampling is another form of random sampling. A “cluster” is any naturally occurring aggregate of the units that are to be sampled. Thus households (or homes) are clusters of people, and towns are clusters of households. Cluster samples are most often used when:
simple random sample
Total Population
sub-population
sub- population
sub-population
simple random sample
simple random sample
Module 9
Page 450 International Program for Development Evaluation Training − 2007
For example, you may want to interview 200 AIDS patients, but these 200 may be selected from only four randomly sampled clinics because of resource constraints. It is possible that the clinics will serve populations that are too similar in terms of economic background or other characteristics, and therefore may not be representative of all AIDS patients. Likewise, it is possible that the selected clinics are too similar in their level of care to patients to accurately represent the total population of patients. When you want to interview people living on small, dispersed, and remote farms, it would be very time consuming to sample the people and then travel to all the farms in which they reside. In a cluster sample, you might sample 10 of the 50 farms and then interview all the people at each of those sampled farms.
Sometimes combinations of methods are used. The group may be divided into strata; all the people in one stratum might be selected (as a census) and a random sample selected from the other strata. The program in Ghana (see Case 9-1) gives an example of a complex use of combination random sample.
For example, you want to observe classroom activities to measure the amount of time spent doing hands-on learning activities. You can:
Sampling
International Program for Development Evaluation Training − 2007 Page 451
Table 9.1 gives you a summary of the random sampling process.
Table 9.1: Summary of Random Sampling Process Step Process
The intent of the program is to increase the nutritional status and food security of poor households in Ghana. The Credit with Education Program combines: (1) providing credit to participants with (2) education on the basics of health, nutrition, birth timing, and small business skills. Evaluation Questions: Did the program have an impact on the nutritional status of children, women’s economic capacity, women’s knowledge of health issues, and ability to offer a healthy diet to their children? Overall Design: A quasi-experimental design using two surveys. Nineteen communities that did not yet have the Credit with Education Program were the focus of this study. The communities were divided into groups (strata) based on set criteria. Within each of the strata, communities were assigned to either to a treatment group (will receive the Credit with Education Program) or to a control group (who will not receive the program). They were not randomly assigned; three were assigned for political reasons and three were assigned as matched controls. Sampling Within the Communities: Three groups of women with children were surveyed: those who participated at least one year (all participants were selected); those who did not participate but were in the program communities (random sample); and those in control communities (random sample). In all, ten mother/child pairs with children aged 12-23 months were chosen from each of the small communities; 30 from the large communities.
Sampling
International Program for Development Evaluation Training − 2007 Page 453
When using a non-random sample, examining the issue of bias is important. Is there something about this particular sample that might be different from the population as a whole? You might want to gather demographic information so you can describe the characteristics of your sample. Ideally, there will be no obvious differences between the sample and the population. But, when you report the demographics of the sample, your audience can then make a judgment as to how similar the sample is to the population. When using a non-random sample, you need to report your results in terms of the respondents. For example, “Of the mothers interviewed, 70% are satisfied with the quality of the healthcare their children are receiving." Without random sampling, you have to be careful about generalizing to a larger population. However, the data may be very useful and may be the best given your situation. Always make your sample selection criteria and procedures clear.
Combinations Random and non-random methods can be combined. For example, you may be collecting data on schools. You can select two schools from the poorest communities and two from the wealthiest communities. Then from these four schools, you can randomly select students for your data collection.
How Confident and Precise Do You Need to Be?
Even when you use a random sample, there is some possibility of error. It is possible that your sample will be different from the population. This is where statistics come in (see Module 11, Data Analysis and Interpretation ). The narrowest definition of statistics concerns the validity of data derived from random samples. More specifically, it is concerned with estimating the probability that the sample results are representative of the population as a whole. Statisticians have developed theories and formulas for making these estimates and selecting sample size. While we will present some statistics in the next module, we will not present or discuss statistical formulas here. Rather, we will focus on understanding the basic concepts of statistical analysis, and how to apply them to designing evaluations.
Module 9
Page 454 International Program for Development Evaluation Training − 2007
You have some options in deciding how accurate and precise you need to be in inferring results to the larger population. The first thing you want to do is decide how confident you wish to be that your sample results are an accurate estimate of what is true for the entire population. The standard confidence level is 95%. This means you want to be 95% certain that your sample results are an accurate estimate of the population as a whole. If you are willing to be 90% certain, your sample size will be smaller. If you want to be 99% confident (only 1% chance of having the sample be very different from the population as a whole), you will need a larger sample. Your next choice is about how precise you need your estimates to be. This is sometimes called sampling error or margin of error. We often see this when results from polls are reported. For example, you might read in the paper that 48% favor raising taxes and 52% oppose raising taxes (+/- 3%). What this means is that if everyone in the population were asked, the actual proportions would be somewhere between 45% to 51% (48 +/- 3) favoring raising taxes, and 49% to 55% (52 +/- 3) _ opposing. Most evaluations accept a sampling error of 5%. In the tax example, if we had a 5% margin of error, than the true picture of opinions would be between 43% to 53% favoring raising taxes and between 47% to 57% opposing raising taxes. As you can see, there is more variability (less precision) in our estimates with a +/-5% margin of error as compared to a +/- 3% margin. The more precise you want to be, the larger your sample will need to be. In both examples, however, note that the sampling errors overlap. This means that these results are too close to call. When working with real numbers, such as age or income, precision is presented in terms of the confidence interval. (Note: this is not to be confused with the concept of confidence level explained earlier). We use this when we want to estimate the mean of the population based on our sample results. For example, if the average per capita income of the rural poor in our sample is 2,000 South African Rand per year, the computer might calculate a 95% confidence interval as between R1,800 and R2,200. We can then say that we are 95% certain (this is the confidence level ) that the true population's average salary is between 1,800 and 2,200 (this is the confidence interval ).
Module 9
Page 456 International Program for Development Evaluation Training − 2007
However, low response rates always have the threat of non- response bias. Over-sampling cannot control for this. No amount of over-sampling can control response bias if you have a low response rate (e.g., only 20%). Rather than over- sampling, the evaluator should put extra resources into doing everything possible to obtain a high response rate. Such efforts should include incentives and multiple follow-ups to non- respondents. While samples are used to keep costs of data collection down, go for as large a sample as you can manage. This will make your estimates of the population as accurate as possible. If you can do the entire population, opt for that, because there will then be no sampling error involved. However, keep mind that censuses can also yield biased data if there are low response rates.
Sampling
International Program for Development Evaluation Training − 2007 Page 457
Table 9.3 gives you a summary of sample sizes for very large populations (those of 1 million or larger). Many national surveys use samples of about 1,100 because that makes the margin of error + or –3 percentage points with a 95% confidence level. Table 9.3: Sampling Sizes for Large Populations Precision (margin of error, +/-%)
Confidence Level
99% 95% 90% ± 1% 16,576 9,604 6, ± 2% 4,144 2,401 1, ± 3% 1,848 1,067 752 ± 5% 666 384 271
Tables 9.4 and 9.5 1 show the confidence intervals for two population sizes (100 and 50): a few samples sizes and various proportions found in the sample. Table 9.4 shows confidence levels for populations of 100. Table 9.4: 95% Confidence Intervals for a Population of 100. Proportion Found in the Sample Sample Size .5 .4 or .6 .3 or .7 .2 or .8 .1 or. 75 ± .06 ± .06 ± .05 ± .05 ±. 50 ± .10 ± .10 ± .09 ± .08 ±. 30 ± .15 ± .15 ± .14 ± .12 ±.
Table 9.4: 95% Confidence Intervals for a Population of 50. Proportion Found in the Sample Sample Size .5 .4 or .6 .3 or .7 .2 or .8 .1 or. 30 ± .11 ± .11 ± .10 ± .09 ±. 20 ± .17 ± .17 ± .16 ± .14 ±.
(^1) Gregg B. Jackson (2005). Sampling in Development Evaluations. Presentation at IPDET, July 5 and 6, 2005, p 28.
Sampling
International Program for Development Evaluation Training − 2007 Page 459
Summary
In this module, you learned about sampling. Review the following checklist. Check those items that you can complete and review those that you cannot.
define the sampling concepts, including: population sample sampling frame sample design parameter statistic random sample
discuss the differences between random and non- random samples
list and define the types of random samples simple random samples stratified random samples multi-stage random samples cluster random samples
list and define the types of non-random samples quota accidental snowball judgmental convenience
describe the role of statistics to determine confidence and precision
describe the guidelines to determine the size of sample that is needed.
Module 9
Page 460 International Program for Development Evaluation Training − 2007
− random samples y simple, stratified, cluster or combination random samples − non-random samples y quota, accidental, snowball, judgmental, convenience and combination non-random samples
− confidence and precision − sample size.