



Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
The concept of sampling from a finite population, specifically in the context of surveys. It discusses the importance of avoiding systematic biases, focusing on selection bias and non-response bias. Real-life examples are provided to illustrate the consequences of biased polls. The document also introduces different sampling methods, such as quota sampling and simple random sampling, to reduce bias.
Typology: Study notes
1 / 5
This page cannot be seen from the preview
Don't miss anything!




Sample Surveys
selection bias and the non-response bias.systematic biases. The two most typical sources of bias are theaverages or proportions. A well designed survey avoids incurring insurvey is to learn about some parameters of a population, likepopulation. This is usually referred to as a survey. The goal of the In this class we will consider the problem of sampling from a finite
62
Collecting data: Sample Surveys
A (^) population (^) is a class of individuals that an investigator is
no Nuevo State
A full examination of a population requires a 4. (^) The bottles of beer that are produced at a certain brewery.Reserve during the winter.
(^) census
. This may be
examined, then we are looking at aimpractical in many cases. If only one part of the population is
(^) sample
. The goal is to make
inferences (^) from the sample to the whole population.
AMS-5: Statistics
that we are interested in. These are called There are usually some numerical characteristics of the population
(^) parameters
. For example
statisticsParameters are unknown quantities which are estimated usingbeer. 4. (^) The percentage of bottles that are not properly filled with 3. (^) The proportion of puppies per female elephant seal. 2. (^) The average income of potential costumers. 1. (^) The average age of eligible voters , which are numbers that can be computed from the
sample represents the population.sample. The validity of those values depends of how well the
64
A biased poll
Before the 1936 presidential election the
(^) The literary Digest
, a very
62% to 38%.The result of the election was that Roosevelt won by a landslideelection to Landon obtaining only 43% of the votes.prestigious magazine, predicted that Roosevelt will loose the
Because their poll was badly designed^ Why was the Literary Digest so wrong?
obtained from telephone books and club membership lists.million people. The names and addresses of these people werewho responded to a mailed questionnaire that was sent to 10 The Digest based its prediction on a sample of 2.4 million people
AMS-5: Statistics
number of non-respondents. This produces a Another source of bias in the Digest’s poll is that there was a largenot improve the results Taking a large number of samples with a biased procedure doesLandon.economic line: the poor voted for Roosevelt and the rich were withoutcome of the election showed a split that followed a clearunlikely to belong to clubs or have phones (in the ’30s). TheThe sample had a strong bias against the poor, since they were
(^) non-response bias
since non-respondents can be very different to respondents.
66
AMS-5: Statistics
Quota Sampling
(^) representative
.
This is called a
(^) quota sampling
(^) scheme.
68
a decade, these are the results regarding the Republican vote:Gallup polls were conducted using the quota system for more thanwisdom.gets interviewed, that is, the ultimate selection is left to human But, at the end, the interviewer has the freedom of deciding who
Year (^) Prediction (^) Results (^) Errors
in all elections from ’36 to ’48.Gallup had a systematic bias in favor of the Republican candidateIn the 1948 election, Gallup predicted the wrong winner. The sample sizes are around 50,000.
AMS-5: Statistics
The results
Yearsome elections from 1952 to 1992.^ The following table presents the results of Gallup’s predictions for (^) sample size
(^) Won
Prediction (^) Result (^) Error
Eisenhower
Kennedy
Nixon
Carter
Reagan
Clinton
sizes.bias in favor of the Republican candidate and much smaller sample We observe a much smaller error (except for the 1992 election), no
74
Problems
Non-voters:sample.bias the results of the survey even after considering a probabilistic Investigators doing polls have to face several problems that can (^) Usually between 30% and 50% of the eligible voters
Undecided:not.questions that allow to check if the person is genuinely a voter orwhen asked about their voting intentions. Interviewers ask indirectdon’t vote. But many of these are tempted to respond affirmatively (^) Polls ask questions that give information about the
Response bias:vote of undecided voters.political attitudes of the interviewed person in order to forecast the
(^) Questions can be posed in a way that bias the
ballot in a box.response. A useful tool is to have the interviewed person deposit a
AMS-5: Statistics
Non-response bias:
(^) As discussed before, this can create a bias since
Check data:the non-respondents.since they, somehow, represent a subpopulation which is closer tocorrected by giving more weight to people who are difficult to get,non-respondents are different from the rest. This is usually (^) Some groups are likely to get more subjects in the
Control:the sample using demographic data.sample than others. This is usually corrected during the analysis of (^) Interviewers are controlled either by direct supervision or
survey.by the cross-validation provided by redundant information in the
76
Telephone surveys
How do you select sample? Phone numbers look like thisless time. Conducting a survey by phone saves money. It can also be done in
Area code (^) Exchange (^) Bank (^) Digits
(^) strata .
AMS-5: Statistics
Problems
Problem 1: (^) A survey organization is planning to a an opinion
Problem 2:costly.interviewing 2,500 people all scattered around the map will be verysample of such list is a big problem in itself and third becausethe voters is not available. Second because taking a simple randomabout 200 million voters is impractical. First because a list of allThis is false. Taking a simple random sample of a population ofa simple random sample.explain: the organization will choose people to interview by takingsurvey of 2,500 people of voting age in the U.S.. True or false and (^) A sample of Japanese-American residents in San
residents in those areas. However, a comparison with Census datablocks in the Japanese area of the town and interviewing all theFrancisco is taken by considering the four most representative
AMS-5: Statistics
78
college degrees were living in more suburban neighborhoods.specific characteristics. In particular, it is likely that people withexpect that people living in the more traditional areas have veryThis was not a good way to draw the sample because you wouldJapanese with college degrees. How can this be explained? shows that the sample did not include a high enough proportion of
AMS-5: Statistics