Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Sampling from Populations: Understanding Bias and Variability in Statistical Inference, Study Guides, Projects, Research of History

History

An introduction to sampling from populations, discussing the importance of statistical inference, random sampling, bias, and variability. Through examples and case studies, learners will understand how to recognize sources of bias and the importance of representative samples.

Typology: Study Guides, Projects, Research

2021/2022

Uploaded on 09/27/2022

princesspeach 🇺🇸

4.7

(6)

226 documents

1 / 22

This page cannot be seen from the preview

Don't miss anything!

Sampling from a Population

Ryan Miller

1 / 22

Partial preview of the text

Download Sampling from Populations: Understanding Bias and Variability in Statistical Inference and more Study Guides, Projects, Research History in PDF only on Docsity!

Sampling from a Population

Ryan Miller

The Candy Activity

Today I’ve brought with me a bag containing 100 pieces of candy, it is your job to correctly determine the weight of the bag With your group, you will:

Sample 5 candy pieces from the bag
Weigh your sample
Multiply your sample’s weight by 20 to estimate the entire bag’s weight
Return your sample to the bag

The group whose estimate is closest to the bag’s weight will be given the entire bag to consume or distribute as they see fit

Populations vs. Samples

Every statistical analysis begins with a question - ie: How much does the bag of candy weigh?

I (^) The best approach is to weigh the entire bag I (^) But what if your access to the bag is limited? I (^) In our example, the 100 pieces of candy in the bag represent a population - all of the cases we want to learn about I (^) I didn’t allow you access to the entire population, but rather a sample - a subset of cases from the population

We denote the size of a sample using n , ie: n = 5

Practice

In a study on hand washing, researchers in several cities across the United States pretended to comb their hair in public restrooms while observing whether or not people washed their hands after going to the bathroom. They found that 85% of the 6,000 individuals they observed washed their hands.

What is the population? What is the sample?

I (^) We could say the population is all people in the US that use public restrooms I (^) But people are likely to behave differently when someone else is in the restroom with them I (^) It would be wise to restrict the population to people in the US using a restroom with another occupant

Statistical Inference - Notation

Statisticians use different notation to distinguish population parameters (things we want to know) from estimates (things derived from a sample). For a few common measures, this notation is summarized below:

Statistic Population Parameter Estimate (from sample) Mean μ ¯ x Standard Deviation σ s Proportion p ˆ p Correlation ρ r

For example, μ is the mean of the population, while ¯ x is the mean of the cases that ended up in the sample.

Now Let’s Weigh the Bag

I didn’t know what your estimates would be when I prepared

these slides... but I predict that all of them are way too high!

Randomness and Variability

I (^) Any given sample, regardless of how it was collected, only contains a subset of cases from the population I (^) This introduces variability when trying to use the sample to estimate a population parameter I (^) Just by random chance, some samples will yield more accurate estimates than other samples, even if an ideal sampling protocol is used I (^) Next week we’ll approach the goal of trying to understand this variability, today we’ll continue learning about sampling

Bias and Variability

To summarize, there are two reasons why an estimate might not accurately represent a population parameter, bias and variability :

Variance decreases with larger sample sizes Bias is not improved by a larger sample

Case Study - The 1936 President Election

I (^) Since 1916, the Literary Digest magazine had correctly predicted the winner of 5 straight presidential elections I (^) Prior to the 1936 election, the Literary Digest sampled 2. million people and predicted a landslide victory for Landon: 57% - 43% I (^) In the actual election, Roosevelt won by a landslide: 62% - 38%

How could the Digest have been so far off?

I (^) Take a minute to discuss this with your group I (^) Consider whether the inaccurate estimate could be due to bias or variability

Case Study - The 1936 President Election

Selection Bias

I (^) The Literary Digest sent 10 million questionnaires to addresses gathered from telephone books and club memberships I (^) This disproportionately screened out the poor; Only 1 in 4 households owned a telephone at the time, and club members tended to be upper class I (^) Selection bias resulted in a non-representative sample

Non-response Bias I (^) Of the 10 million questionnaires, only 2.4 million were returned I (^) Responders tend to be different from non-responders I (^) The 2.4 million respondents likely weren’t even representative of the 10 million people polled

That was 1936, surely today we understand the importance of representative samples... right?

Discussion

With your group take a look at the NY Times article: https://www. nytimes.com/interactive/2017/07/25/sports/football/nfl-cte.html

What is the actual population of the study?
What is the population most people jump to conclusions about when they see the headlines on the previous slide?

Case Study - CTE and Football

Article Link: “I’m a brain scientist and I let my son play football” The study population in the most recent CTE paper represents a biased sample, as stated by the authors themselves. This means only the brains of self-selecting people who displayed neurological symptoms while living were studied. This is important because this sample was not a reflection of the general football population. The study was based on 202 brains out of the millions of people who have played football all of which are former NFL players.

So, when you hear 99 percent of football players had CTE, that doesn’t mean that almost every football player will get CTE, and it doesn’t mean your child has a 99-percent chance of developing CTE if he or she plays football. It means 99 percent of a specifically selected study sample had some degree of CTE; not 99 percent of the general football population. This is an important distinction.

Some Other Sources of Bias

When collecting data it is crucial to be aware of potential sources of bias, some examples include:

Social Desirability Bias - Respondents tend to answer questions in ways that portray themselves in a positive light Link
Habituation Bias - Respondents tend to provide similar answers for similarly worded or structured questions (the brain going on autopilot) Link
Leading Questions - The wording of a question impacts how people respond, great examples in the textbook
Cultural Bias - Questions are often to be constructed with one’s own culture in mind, they might not even make sense to people from other cultures.

This isn’t a complete list, there are countless reasons for data not being representative of the population of interest

Practice

With your group, discuss whether each of the following are a sample or a population. If the data are a sample, describe the target population and whether the sample is biased

To estimate the size of trout in a lake, an angler records the weight of the 12 trout he catches over a weekend
A subscription based music website tracks the listening history of its active users
The Department of Transportation announces that of the 250 million registered cars in the US, 2.1% are hybrids
An online poll seeking to learn about adult workers asks: “What do you think of having an everyday uniform for work, like what Steve Jobs did?” 24% of people said they loved the idea

Sampling from Populations: Understanding Bias and Variability in Statistical Inference, Study Guides, Projects, Research of History

Related documents

Partial preview of the text

Download Sampling from Populations: Understanding Bias and Variability in Statistical Inference and more Study Guides, Projects, Research History in PDF only on Docsity!

Sampling from a Population

The Candy Activity

Populations vs. Samples

Practice

Statistical Inference - Notation

Now Let’s Weigh the Bag

I didn’t know what your estimates would be when I prepared

these slides... but I predict that all of them are way too high!

Randomness and Variability

Bias and Variability

Case Study - The 1936 President Election

Case Study - The 1936 President Election

Discussion

Case Study - CTE and Football

Some Other Sources of Bias

Practice