Understanding Representative Samples: Sampling & Proportions in Sciences - Prof. Nelleke B, Study notes of Introduction to Philosophy

The importance of sampling and estimating proportions in various fields, including medical trials, animal studies, marketing, and politics. The concept of representativeness and the need for proper sampling techniques to ensure accurate estimates. It also introduces the idea of correlations between secondary properties and primary properties, and the importance of controlling for these correlations in obtaining a representative sample. Examples of questions about proportions and the need for induction based on sampling a subset of the population.

Typology: Study notes

Pre 2010

Uploaded on 04/29/2010

bpickett3
bpickett3 🇺🇸

8 documents

1 / 4

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
PHL 120 – Martin, Chapter 3 SAMPLING
1. A great many of the questions which interest people in both the natural and
social sciences - in medical trials, animal studies, marketing and politics –
concern the proportion, expressed as a percentage, of individuals in a
population which have some particular property.
2. Notes on terminology:
i. We are seldom interested in proportions near the 100% level, because in
these cases we usually already know that almost every individual has the
property in which we’re interested. Sometimes, especially in medical
epidemiology, we are interested in proportions near the 0% level. (E.g. of
Jenner’s study of smallpox.)
ii. A population is any class of objects or events; not necessarily people.
Similarly, by individuals we simply mean any particular member of the
class; these will not be people unless the population in questions is a
population of people.
iii. Properties are characteristics which we believe some individuals in a
population have.
3. Martin provides some examples of questions about proportions on p. 53. Here
are some others:
How many smokers are likely to develop lung cancer?
How many non-smokers are likely to develop lung cancer?
How many wolf cubs survive to reach maturity?
How many people in Birmingham will want to buy the latest Frank Stitt
cookbook during the next six months?
What proportion of the stimulus package home-owners’ rebate will
Americans save rather than spend in 2009?
4. We cannot answer these questions by examining every smoker, or every non-
smoker, or every wolf cub, or everyone in Birmingham or America. To answer
these questions, we must perform induction based on sampling a sub-set of the
population. If we sample properly, our aim is to establish a good inductive
argument of the following form:
Among 600 wolf cubs tagged at birth, 200 were still found alive at
maturity.
Therefore, about 33% of wolf cubs survive to maturity.
5. There are many conditions necessary for proper sampling. Obviously, if one
sampled only wolf cubs born in a drought stricken area, one would arrive at an
unreasonably low estimate of young wolf cub mortality in general. Conversely, if
one sampled only wolf cubs born in zoos, one would obtain an unreasonably
high estimate. In either of these cases, the inductive argument above would fail
to justify its conclusion.
pf3
pf4

Partial preview of the text

Download Understanding Representative Samples: Sampling & Proportions in Sciences - Prof. Nelleke B and more Study notes Introduction to Philosophy in PDF only on Docsity!

PHL 120 – Martin, Chapter 3 SAMPLING

1. A great many of the questions which interest people in both the natural and social sciences - in medical trials, animal studies, marketing and politics – concern the proportion , expressed as a percentage, of individuals in a population which have some particular property. 2. Notes on terminology: i. We are seldom interested in proportions near the 100% level, because in these cases we usually already know that almost every individual has the property in which we’re interested. Sometimes, especially in medical epidemiology, we are interested in proportions near the 0% level. (E.g. of Jenner’s study of smallpox.) ii. A population is any class of objects or events; not necessarily people. Similarly, by individuals we simply mean any particular member of the class; these will not be people unless the population in questions is a population of people. iii. Properties are characteristics which we believe some individuals in a population have. 3. Martin provides some examples of questions about proportions on p. 53. Here are some others:  How many smokers are likely to develop lung cancer?  How many non-smokers are likely to develop lung cancer?  How many wolf cubs survive to reach maturity?  How many people in Birmingham will want to buy the latest Frank Stitt cookbook during the next six months?  What proportion of the stimulus package home-owners’ rebate will Americans save rather than spend in 2009? 4. We cannot answer these questions by examining every smoker, or every non- smoker, or every wolf cub, or everyone in Birmingham or America. To answer these questions, we must perform induction based on sampling a sub-set of the population. If we sample properly, our aim is to establish a good inductive argument of the following form: Among 600 wolf cubs tagged at birth, 200 were still found alive at maturity. Therefore, about 33% of wolf cubs survive to maturity. 5. There are many conditions necessary for proper sampling. Obviously, if one sampled only wolf cubs born in a drought stricken area, one would arrive at an unreasonably low estimate of young wolf cub mortality in general. Conversely, if one sampled only wolf cubs born in zoos, one would obtain an unreasonably high estimate. In either of these cases, the inductive argument above would fail to justify its conclusion.

6. To sample in such a way as to obtain a good inductive argument justifying an estimate about a proportion, one must take steps to ensure that one’s sample is representative of the population as a whole. 7. To discuss representativeness, we must introduce some more terminology. Some properties tend to be correlated with one another. So, for example, Among wolves, being born in a drought is correlated with dying young. Being likely to buy the latest Frank Stitt cookbook is correlated with being female. Being disposed to save windfall profits is correlated with income levels. We can also distinguish between positive and negative correlations. Among wolves, being born in a drought is positively correlated with dying young, and being born in a zoo is negatively correlated with dying young. Among people, being wealthy is positively with saving windfall profits, and being poor is negatively correlated with saving windfall profits. 8. Consider the last example. The disposition to save windfall profits is the primary property which concerns economists who are trying to predict the effects of the stimulus package on the American economy. Income level is a secondary property which is correlated with this primary property. To obtain a representative sample, we must control for such secondary properties. In the case at hand, our sample will only be representative if the distribution of income levels in the sample is roughly the same as the distribution of income levels in the entire population of Americans who own homes. Note that the distribution of income levels in our sample should not be roughly similar to the distribution of income levels in the entire population of Americans. For one thing, the whole American population is not the population with which our question is concerned. Furthermore, being poor is negatively correlated with owning a home, and the average American is poorer than the average home-owner. 9. Out example illustrates the two principles given by Martin on p. 59. We will have an unrepresentative sample, and thus get a bad inductive argument, if: i. The proportion of a secondary property may be somewhat higher or lower in the sample than in the population and ii. Individuals with that secondary property may be somewhat more likely (positive correlation) or somewhat less likely (negative correlation) to show the primary property. 10. So, how do we get a representative sample? One technique is sample matching. This works as follows: Step #1: Think of all the possibly relevant properties you can.

v. Young, poor non-farmers vi. Young, wealthy non-farmers vii. Old, poor non-farmers viii. Old, wealthy non-farmers We added one new secondary property, yet doubled our total.

  1. There is an inexorable spiral here: the more closely we try to make our sample representative of our population, the wider we must cast our search, at least if we’re following no control procedure beyond simple sample- matching. Eventually, as more and more possibly relevant secondary properties occur to us, we’ll achieve a representative sample only by examining the entire population; and then we’re no longer sampling at all. So, except in the case of studies of primary property distribution where we have good reason to think that the number of possibly relevant secondary properties is very small, straightforward sample matching won’t do. We must use more sophisticated techniques. EXERCISE: Let’s modify the one that Martin uses: the attempt by pollsters to predict the proportion of voters who will vote in a specific way. Let’s adapt it to trying to determine what percentage of Alabama’s voters will vote Democrat. How would you as a political pollster go about match-sampling? We know from previous studies and census surveys that there are correlations between places of residence, levels of income, levels of education, age, gender, race, church affiliation, sexual orientation and voting patterns.