Understanding Bias in Sample Surveys: Selection and Non-Response, Study notes of Statistics

The concept of sampling from a finite population, specifically in the context of surveys. It discusses the importance of avoiding systematic biases, focusing on selection bias and non-response bias. Real-life examples are provided to illustrate the consequences of biased polls. The document also introduces different sampling methods, such as quota sampling and simple random sampling, to reduce bias.

Typology: Study notes

Pre 2010

Uploaded on 08/19/2009

koofers-user-gc5
koofers-user-gc5 🇺🇸

10 documents

1 / 5

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Sample Surveys
In this class we will consider the problem of sampling from a finite
population. This is usually referred to as a survey. The goal of the
survey is to learn about some parameters of a population, like
averages or proportions. A well designed survey avoids incurring in
systematic biases. The two most typical sources of bias are the
selection bias and the non-response bias.
AMS-5: Statistics
62
Collecting data: Sample Surveys
Apopulation is a class of individuals that an investigator is
interested in. Examples of populations are
1. All eligible voters in a presidential election.
2. All potential consumers of a given product.
3. The female elephant seals that mate at no Nuevo State
Reserve during the winter.
4. The bottles of beer that are produced at a certain brewery.
A full examination of a population requires a census. This may be
impractical in many cases. If only one part of the population is
examined, then we are looking at a sample. The goal is to make
inferences from the sample to the whole population.
AMS-5: Statistics
63
There are usually some numerical characteristics of the population
that we are interested in. These are called parameters. For example
1. The average age of eligible voters
2. The average income of potential costumers.
3. The proportion of puppies per female elephant seal.
4. The percentage of bottles that are not properly filled with
beer.
Parameters are unknown quantities which are estimated using
statistics, which are numbers that can be computed from the
sample. The validity of those values depends of how well the
sample represents the population.
AMS-5: Statistics
64
A biased poll
Before the 1936 presidential election the The literary Digest, a very
prestigious magazine, predicted that Roosevelt will loose the
election to Landon obtaining only 43% of the votes.
The result of the election was that Roosevelt won by a landslide
62% to 38%. Why was the Literary Digest so wrong?
Because their poll was badly designed
The Digest based its prediction on a sample of 2.4 million people
who responded to a mailed questionnaire that was sent to 10
million people. The names and addresses of these people were
obtained from telephone books and club membership lists.
AMS-5: Statistics
65
pf3
pf4
pf5

Partial preview of the text

Download Understanding Bias in Sample Surveys: Selection and Non-Response and more Study notes Statistics in PDF only on Docsity!

Sample Surveys

selection bias and the non-response bias.systematic biases. The two most typical sources of bias are theaverages or proportions. A well designed survey avoids incurring insurvey is to learn about some parameters of a population, likepopulation. This is usually referred to as a survey. The goal of the In this class we will consider the problem of sampling from a finite

62

Collecting data: Sample Surveys

A (^) population (^) is a class of individuals that an investigator is

  1. interested in. Examples of populations are
  2. (^) The female elephant seals that mate at A˜ 2. (^) All potential consumers of a given product.All eligible voters in a presidential election.

no Nuevo State

A full examination of a population requires a 4. (^) The bottles of beer that are produced at a certain brewery.Reserve during the winter.

(^) census

. This may be

examined, then we are looking at aimpractical in many cases. If only one part of the population is

(^) sample

. The goal is to make

inferences (^) from the sample to the whole population.

AMS-5: Statistics

that we are interested in. These are called There are usually some numerical characteristics of the population

(^) parameters

. For example

statisticsParameters are unknown quantities which are estimated usingbeer. 4. (^) The percentage of bottles that are not properly filled with 3. (^) The proportion of puppies per female elephant seal. 2. (^) The average income of potential costumers. 1. (^) The average age of eligible voters , which are numbers that can be computed from the

sample represents the population.sample. The validity of those values depends of how well the

64

A biased poll

Before the 1936 presidential election the

(^) The literary Digest

, a very

62% to 38%.The result of the election was that Roosevelt won by a landslideelection to Landon obtaining only 43% of the votes.prestigious magazine, predicted that Roosevelt will loose the

Because their poll was badly designed^ Why was the Literary Digest so wrong?

obtained from telephone books and club membership lists.million people. The names and addresses of these people werewho responded to a mailed questionnaire that was sent to 10 The Digest based its prediction on a sample of 2.4 million people

AMS-5: Statistics

number of non-respondents. This produces a Another source of bias in the Digest’s poll is that there was a largenot improve the results Taking a large number of samples with a biased procedure doesLandon.economic line: the poor voted for Roosevelt and the rich were withoutcome of the election showed a split that followed a clearunlikely to belong to clubs or have phones (in the ’30s). TheThe sample had a strong bias against the poor, since they were

(^) non-response bias

since non-respondents can be very different to respondents.

66

  • sources of bias:When considering the quality of a survey keep in mind two possiblemay be overrepresented.So in a survey with a high non-response rate, middle class peoplelikely to respond that people from the upper or the lower classes.^ Studies have shown that people from the middle class are more
  • (^) Non-response biasSelection bias

AMS-5: Statistics

Quota Sampling

  • interviews within certain categories. For example:interviewer to the field and ask him or her to get a fixed number of^ Consider the following scheme to obtain a sample. You send an
  • (^) Of the men, 1 has to be black and 6 white. • (^) Of the men, 3 have to be under forty, 4 above forty. • (^) Exactly 7 men and 6 women • (^) Exactly 6 from the suburbs, 7 from the central city.Interview 13 subjects of the population to make itthat is fairly indicative of all demographic and social characteristicsThe list of restrictions could go on. The goal is to achieve a sample

(^) representative

.

This is called a

(^) quota sampling

(^) scheme.

68

a decade, these are the results regarding the Republican vote:Gallup polls were conducted using the quota system for more thanwisdom.gets interviewed, that is, the ultimate selection is left to human But, at the end, the interviewer has the freedom of deciding who

Year (^) Prediction (^) Results (^) Errors

in all elections from ’36 to ’48.Gallup had a systematic bias in favor of the Republican candidateIn the 1948 election, Gallup predicted the wrong winner. The sample sizes are around 50,000.

AMS-5: Statistics

The results

Yearsome elections from 1952 to 1992.^ The following table presents the results of Gallup’s predictions for (^) sample size

(^) Won

Prediction (^) Result (^) Error

Eisenhower

Kennedy

Nixon

Carter

Reagan

Clinton

sizes.bias in favor of the Republican candidate and much smaller sample We observe a much smaller error (except for the 1992 election), no

74

Problems

Non-voters:sample.bias the results of the survey even after considering a probabilistic Investigators doing polls have to face several problems that can (^) Usually between 30% and 50% of the eligible voters

Undecided:not.questions that allow to check if the person is genuinely a voter orwhen asked about their voting intentions. Interviewers ask indirectdon’t vote. But many of these are tempted to respond affirmatively (^) Polls ask questions that give information about the

Response bias:vote of undecided voters.political attitudes of the interviewed person in order to forecast the

(^) Questions can be posed in a way that bias the

ballot in a box.response. A useful tool is to have the interviewed person deposit a

AMS-5: Statistics

Non-response bias:

(^) As discussed before, this can create a bias since

Check data:the non-respondents.since they, somehow, represent a subpopulation which is closer tocorrected by giving more weight to people who are difficult to get,non-respondents are different from the rest. This is usually (^) Some groups are likely to get more subjects in the

Control:the sample using demographic data.sample than others. This is usually corrected during the analysis of (^) Interviewers are controlled either by direct supervision or

survey.by the cross-validation provided by redundant information in the

76

Telephone surveys

How do you select sample? Phone numbers look like thisless time. Conducting a survey by phone saves money. It can also be done in

Area code (^) Exchange (^) Bank (^) Digits

  1. The Gallup poll in ’92 was simpler and worked like this:codes, exchanges, banks and digits as a hierarchy.The Gallup poll in ’88 used a multistage cluster sample using area This produced 12types of areas: heavy, medium and lightly populated areas.There are 4 time zones in the US. Each zone is divided in 3

(^) strata .

  1. (^) They sampled numbers at random within each stratum.

AMS-5: Statistics

Problems

Problem 1: (^) A survey organization is planning to a an opinion

Problem 2:costly.interviewing 2,500 people all scattered around the map will be verysample of such list is a big problem in itself and third becausethe voters is not available. Second because taking a simple randomabout 200 million voters is impractical. First because a list of allThis is false. Taking a simple random sample of a population ofa simple random sample.explain: the organization will choose people to interview by takingsurvey of 2,500 people of voting age in the U.S.. True or false and (^) A sample of Japanese-American residents in San

residents in those areas. However, a comparison with Census datablocks in the Japanese area of the town and interviewing all theFrancisco is taken by considering the four most representative

AMS-5: Statistics

78

college degrees were living in more suburban neighborhoods.specific characteristics. In particular, it is likely that people withexpect that people living in the more traditional areas have veryThis was not a good way to draw the sample because you wouldJapanese with college degrees. How can this be explained? shows that the sample did not include a high enough proportion of

AMS-5: Statistics