Lecture Slides on Sampling Design - Basic Statistics | STT 1810, Study notes of Statistics

Material Type: Notes; Professor: Gosky; Subject: Statistics; University: Appalachian State University; Term: Fall 2009;

Typology: Study notes

Pre 2010

Uploaded on 12/06/2009

christianmw
christianmw 🇺🇸

8 documents

1 / 16

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Chapter 3: Sampling Designs
Steps to Design a Statistical Sample:
1. Determine the study’s objectives.
2. Identify the Population,
3. Define the variables to be measured.
4. Determine the statistical design required to sample from the population.
5. Collect the data.
We focus on step 4 in this chapter
Steps 1 to 3 are often surprisingly difficult
Step 5 often requires lots of effort and followup, especially in surveys
Def: The population is the entire group of individuals we want to obtain information about. A
sample is the part of the population we actually examine to gather information.
Think of the population as the group about whom you want to draw conclusions.
If it is feasible to obtain information from your entire population, by all means do so.
But, in almost all cases, sampling will be needed to obtain a “representative group” of the
population at large.
Def: The design of a sample refers to the method used to choose the sample from a population.
design of the sample is important
if some aspect of the design favors individuals of a particular type, the results can be non-
representative, or biased
Bad Sampling Designs:
(1) Voluntary response samples
allowing anyone who chooses to respond to a survey
Examples: website polls
Why are these a bad idea???
(2) Convenience Sampling Sampling those easiest to reach.
Examples:
ostanding outside Walker Hall and asking the first 200 people I meet
oSurveys at the mall
omaking phone calls until you achieve a certain number of responses
oMass e-mails
Why are these a bad idea??
STT1810 Course Notes: Chapters 3 and 4 Page 1
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff

Partial preview of the text

Download Lecture Slides on Sampling Design - Basic Statistics | STT 1810 and more Study notes Statistics in PDF only on Docsity!

Chapter 3: Sampling Designs

Steps to Design a Statistical Sample:

  1. Determine the study’s objectives.
  2. Identify the Population,
  3. Define the variables to be measured.
  4. Determine the statistical design required to sample from the population.
  5. Collect the data.  We focus on step 4 in this chapter  Steps 1 to 3 are often surprisingly difficult  Step 5 often requires lots of effort and followup, especially in surveys Def: The population is the entire group of individuals we want to obtain information about. A sample is the part of the population we actually examine to gather information. Think of the population as the group about whom you want to draw conclusions. If it is feasible to obtain information from your entire population, by all means do so. But, in almost all cases, sampling will be needed to obtain a “representative group” of the population at large. Def: The design of a sample refers to the method used to choose the sample from a population.  design of the sample is important  if some aspect of the design favors individuals of a particular type, the results can be non- representative, or biased Bad Sampling Designs: (1) V oluntary response samples   allowing anyone who chooses to respond to a survey  Examples: website polls  Why are these a bad idea??? (2) Convenience Sampling  Sampling those easiest to reach.  Examples: o standing outside Walker Hall and asking the first 200 people I meet o Surveys at the mall o making phone calls until you achieve a certain number of responses o Mass e-mails  Why are these a bad idea??

Def: A sampling design that systematically favors certain outcomes is said to be biased. All of the above methods of sampling are biased. Our goal is to eliminate (or at least reduce) any bias in our sample. We will discuss how to do this next. Valid Sampling Methods: Using one of these sampling designs, you can form a valid sample from any population that will be free (as far as you can tell) of any design bias. The 4 valid methods of sampling we’ll discuss in this chapter:

  1. Simple Random Sampling
  2. Stratified Random Sampling
  3. Cluster/Multistage Sampling
  4. Systematic Sampling The first method: Simple Random Sampling (SRS) Steps: Get a complete list of the population. Randomly choose names/IDs from this list, without replacement, until you achieve your desired sample size. Analogy: drawing names from a hat. This avoids bias in the selection process because selection is purely random. Note: The defining characteristic of simple random sampling is:


The defining characteristic is not __________________________________ _____________________________________________________________. Although this is a feature of SRS sampling, there are other sampling methods with this characteristic. Example: Consider a manufacturing company with 5 sites (called A, B, C, D, and E). The company president wants to randomly choose 2 sites to visit for a surprise quality inspection. The list of possible samples of size 2 from this population are: {AB, AC, AD, AE, BC, BD, BE, CD, CE, DE}. So there are 10 possible samples. Under Simple Random Sampling, all 10 of these samples are equally likely. If that is true, then we can see that each site appears in exactly 4 of the 10 samples, meaning each site is also equally likely to appear in the sample. If a population is large, a computer package can be used to select an SRS from almost any sized population.

Example: Consider sampling students who live on campus at ASU, for a survey that required you to administer the survey in person. If you took a standard SRS, you would likely have several residence halls to travel to for only 1-2 people in your SRS (quite a waste of time). What might be easier is to first take an SRS of say, 5 residence halls. Then within those residence halls, take an SRS of many students (or a census). This may save you time and effort because most individuals in your sample will live close to each other. We hope that each cluster is similar to the population at large. In our example, this assumption could be violated if some halls have only first-year students, for example, and if we suspect the survey results may differ by class. Example 2: Suppose we wanted to know how all waiters/waitresses in Boone feel about customer’s tipping habits. No simple population list of waiters/waitresses exists. But, a list of restaurants does exist. Perhaps an SRS of restaurants (stage 1) followed by a census of all waiters/waitresses at the chosen restaurants would work well.

The 4th^ method: Systematic Sampling Systematic sampling refers to choosing every kth member of the population after some random starting point. These are common when the other methods are not feasible, and there is a sequential list of the population available. Example 1: If you visit the Hershey’s factory where Hershey’s kisses are made and want to choose a random sample of Hershey’s kisses, how can you do it? Example 2: Exit polls on voting day. Example 3: Suppose you have a printed (nonelectronic) student directory. You want to choose 100 of them at random. Randomly selecting 100 “numbers” from 1 to 15000 might do the job, but you might also decide that after a random starting point you will choose every 150th^ student name in the list. This might make compiling the sample easier than the SRS method.

  1. Nonresponse: failing to receive a response from a chosen individual in the survey, either by failure to contact the person, or their refusal to participate It is tempting to just randomly choose a replacement person for a non-responder. Is this a good idea? As with undercoverage, nonresponse is also hard to avoid. A survey eventually must end. Would you call someone in your sample more than 100 times to obtain a response? To minimize nonresponse:  use multiple modes of contact (email, phone, in-person, etc.)  keep the survey as short as possible and tell the respondent how long it will take  use name recognition if possible (Gallup, A.C. Nielsen, ASU, etc.)  thank respondent for their time  advance letters  small incentives for completing the survey Every possible effort should be made to get a response from individuals in your survey. This effort will at least minimize the effect of the nonresponse. Nonresponse really becomes an issue when a significant percentage of your sample does not respond.
  2. Response bias: Other sources of bias include response bias. Response bias occurs when respondents do not answer truthfully. This can happen for a number of reasons:  the survey contains a sensitive or personal question  Asking the respondent to recall past events  Wording of questions can cause people to answer incorrectly  Confusing Wording: Do you think that students who abuse illegal drugs should not be prohibited from obtaining financial aid for college?  Biased Wording: The firearm crime rate (victims per 1000 residents) has dropped from 5.9 in the 1990’s to 2.0 in 2005. Do you feel firearms should be banned?  Assuming a motive for answering: Do you support the president’s health care plan, because it would ensure all Americans would receive health care coverage?  Order of questions can make an impact. An experiment was conducted with the following two questions:  A: Do you think the U.S. should let Communist newspaper reporters from other countries come in here and send back to their papers the news as they see it?  B: Do you think a Communist country like Russia should let American newspaper reporters come in and send back to America the news as they see it?

Some respondents were given the questions in the order (A,B). Others were given the questions in the reverse order. Which order do you think led to a higher % of Yes responses to question A? A good survey design can minimize the effect of response bias. Bottom line: Before you accept the results of any survey, you should read the exact question posed. Open vs. Closed Questions: What do you think are the pros and cons of open versus closed questions? Good things about open questions Good things about closed questions Using Samples to make Conclusions about the population: Ultimately, we want to use the results from our sample to make conclusions about our population. However, it is easy to see that if you and I take two different samples, we will get different results even if we use the same sampling design. So, we need some ways to measure how accurate our sample results are. We can use this estimate of our accuracy and build them into our conclusions about the population. (This is why you often see a +/- percentage reported with political polls) We will be studying this concept in detail in the coming chapters of the text. For now, we will state one fact. This fact is that larger samples give more accurate results than smaller samples. If someone wanted to estimate the average exam 1 score in Stt1810, they’d get more accurate results with a sample of 50 students than with a sample of 5 students. The margin of error in sample surveys is 1/sqrt(n), regardless of the population size. So: if 1000 adults are surveyed and asked whether they are in favor of the No Child Left Behind and 651 respond yes, then we think the true value in the population is between 65.1% plus or minus 1/sqrt(1000) or 0.031 or 3.1%. So 65.1% + 3.1%  (62%, 68.2%). These results are often given with 95% confidence, which will be discussed later in the course but can be thought to represent our certainty that the interval above captured the true percentage in the population.

Example 1: Which headache treatment is best? 200 mg. advil; 200 mg. aspirin; 200 mg. Tylenol. Response = time until relief from symptoms achieved. Can we let people pick their treatment? Why not? Randomizing means that if 90 people available, we randomly choose 30 for Advil, 30 for Aspirin, and the rest get Tylenol. Example 2: Does liquid consumption with the medicine make a difference? We might suppose that drinking 12 oz. liquid is better than drinking, say, 2 oz. So, we might construct treatments as: Drug Advil Aspirin Tylenol L I 2 oz. Trt 1 Trt 2 Trt 3 Q U I 12 oz. Trt 4 Trt 5 Trt 6 d The role of randomization: Randomness decides who gets which treatment. Randomize order of treatments if administered in sequence. e.g. a person takes a test on a computer, on paper, and on a computer with help menus. Make each person take each test. Why? And randomize the order of the tests per person.

Other features of experiments: (1) Control Group: a group who is treated in the same way except that they don’t receive the active treatment. Example: You have a bench of 12 plants (of same variety, say). Does talking to them make a difference? (2) Placebos: a dummy treatment with no benefit (often in health studies) to screen out psychological effects confounded with a treatment. (3) Blinding: not telling the subject which treatment they received Double Blinding: not telling the subject nor the experimenter which treatment was received. A separate experimenter who does not take the measurements knows who received which treatment. (4) Double dummy treatments (e.g.: acupuncture vs. chiropractic for back pain). Patients know which they’re getting (which can be bad), but we could have each patient receive “both” with 1 being a dummy placebo treatment. (Example: August 2002, Journal of Human Hypertension; patients with high bp given 2 week washout period; 2 week placebo period of both (to test for true hypertension); and then double dummy treatments with 10 mg lercanidipine or 50 mg losartan).

Valid Experimental Designs (1) Completely Randomized Design (CRD): Each treatment is randomly assigned to a certain number of experimental units. e.g: 20 people get diet 1; 20 get diet2; 20 get diet 3, chosen randomly. (2) Matched pairs designs: Subjects are “paired” so that they’re similar in every known way except for which treatment is given. Often this pairing is on the same person. Ex: testing 2 types of sunscreen. Some people burn easily; others do not. Have each person put sunscreen 1 on one arm; sunscreen 2 on the other. Now this person has served as their own control group, screening out a lot of potential variability unrelated to sunscreen. Ex: surgery vs. drug therapy for a disease. You can’t easily “placebo” a surgery. Technically you can but it’s probably not ethical. So each person gets one treatment type. But, you match patients between the two treatments by finding 2 patients with similar age, weight, disease severity (all of which could affect treatment), and then randomize who gets which treatment. Ex: does church attendance prolong life? This can’t be done as an experiment. Who wants to be randomly assigned to attend or not to attend church? But, people who are ill probably don’t go to church and live shorter lives. How to handle this? Other habits of church-attendees: lower alcohol consumption, less tobacco use, social support, etc. may benefit someone (not necessarily the Sunday Sermon). So, if we find 2 people who have relatively similar health habits, age, overall health, and where one goes to church and the other does not, we may suitably match them and note any differences in lifespan. This is still observational but it’s a good approach given the limitations of observational studies. (3) Block Designs: Idea: Assign each experimental unit once within each block (a group of units thought to respond similarly regardless of treatment). Ex: greenhouse bench plant assignment example. In some matched pairs designs, people serve as a block when they receive multiple treatments (each person ls like a greenhouse bench). Read examples on page 73 of text, which are good case studies.

Valid Observational Study Approaches Sometimes an observational study is the best you can do. (Health studies, for example) (1) Retrospective and Prospective Studies: Retrospective studies ask a respondent to categorize themselves or provide measurements of past behavior (eg: have you had a heart attack? Have you smoked as an adult for at least 2 years?) Prospective studies follow individuals for a period of time and measure at least some outcome variables based on this tracking period. (e.g.: do you smoke now? Y/N; follow them for 5 yrs and see if they have a heart attack). Which are better? Which are easier? (2) Case Control Studies: Cases of an outcome variable are compared with cases without the outcome variable and researchers look to see if any explanatory factors differ with regard to the two groups. Example: (in text): Baldness and heart attacks. Hospitalizations for heart attacks are measured: are male patients bald(ing) or not? Then, hospitalizations for other cases (not heart attacks) are also measured for baldness. Are there differences among the two groups? Case control studies are relatively easy to do (compared to experiments) when a control group is easy to find and measure. There is some thinking that case control studies may reduce the effect of potential confounding variables. (See pages 76-77 in the text for an example.)

(4) Hawthorne Effects and Experimenter Effects The Hawthorne effect refers to effects seeming larger during an experiment than they will be in reality. Example: ______________________________________ An experimenter may have unconscious bias (e.g. doctor recording whether a placebo worked). Or, conscious bias (knowing that they want their treatment to work; they subtly steer the results). (5) Generalizability Need to make sure the experiment corresponds to reality as much as possible. For example, if you have 3 training methods for problem solving, you may want to test people using the methods in their real jobs, not just to have them solve canned problems in a training session to determine if one method is better than another.