Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Understanding the Concept of Sampling Distributions and Mean in Statistics, Study notes of Data Analysis & Statistical Methods

State University of New York Polytechnic - Utica-Rome Data Analysis & Statistical Methods

An explanation of the concept of sampling distributions and mean in statistics. It includes examples using a deck of playing cards and the central limit theorem. The document also covers calculating means and standard deviations for sample proportions.

Typology: Study notes

Pre 2010

Uploaded on 08/09/2009

koofers-user-x4s 🇺🇸

10 documents

1 / 8

This page cannot be seen from the preview

Don't miss anything!

STA100

Sampling Distributions

Lecture 14

Sampling Distributions Text Sections 7.1 and 7.2

If you think of almost any interesting question you would like to answer about almost any real world

situation you are immediately drawn to the notion of uncertainty. You might wonder whether taking

niacin will raise your HDL cholesterol. Until you try you won’t really know, but you can consult the

literature and find that, on average, “NIASPAN increases HDL cholesterol an average of 14% for men

and 20%” (http://www.niaspan.com/About_Niaspan/index.asp). But do they really know this? Did

they give Niaspan to all women in America and report the number? Did they give it to a sample? How

large a sample? Did they use 10 people? Did they use 1000 people? How many should you use to

get an estimate for the real average increase in HDL? Why is the average interesting?

To make some of these ideas a little bit more concrete, you need to get out a deck of playing cards.

Please take a second and do this. As Euclid told King Ptolemy "there is no royal road to geometry"

(i.e. no easy way to learn) and as you’ve undoubtedly found out by now there is none to statistics,

either.

Now that you have your deck of 52 playing cards, think of the “average” card. Let’s agree that we will

count an Ace as 1, 2 as 2, … 10 as 10, Jack as 11, Queen as 12, and King as 13. The deck will be

our population. The mean of this population is . The table

below shows how to calculate the mean for our grouped data:

x

f

x*f

x*(f/n)

1

4

0.076923

2

4

8

0.153846

3

4

12

0.230769

4

16

0.307692

5

4

20

0.384615

6

4

24

0.461538

7

4

28

0.538462

8

4

32

0.615385

9

4

36

0.692308

10

4

40

0.769231

11

4

44

0.846154

12

4

48

0.923077

13

4

52

1

sum

364

7

If you don’t like the table, just think about where your population distribution would “balance”

In this simple situation we know the average, so we are in a position to experiment a little. Then,

when we want to estimate the average blood pressure for “smokers” we will know how to work.

Discover Study notes of Data Analysis & Statistical Methods State University of New York Polytechnic - Utica-Rome

Partial preview of the text

Download Understanding the Concept of Sampling Distributions and Mean in Statistics and more Study notes Data Analysis & Statistical Methods in PDF only on Docsity!

STA100 Sampling Distributions Lecture 14

Sampling Distributions Text Sections 7.1 and 7.

If you think of almost any interesting question you would like to answer about almost any real world situation you are immediately drawn to the notion of uncertainty. You might wonder whether taking niacin will raise your HDL cholesterol. Until you try you won’t really know, but you can consult the literature and find that, on average, “NIASPAN increases HDL cholesterol an average of 14% for men and 20%” (http://www.niaspan.com/About_Niaspan/index.asp). But do they really know this? Did they give Niaspan to all women in America and report the number? Did they give it to a sample? How large a sample? Did they use 10 people? Did they use 1000 people? How many should you use to get an estimate for the real average increase in HDL? Why is the average interesting?

To make some of these ideas a little bit more concrete, you need to get out a deck of playing cards. Please take a second and do this. As Euclid told King Ptolemy "there is no royal road to geometry" (i.e. no easy way to learn) and as you’ve undoubtedly found out by now there is none to statistics, either.

Now that you have your deck of 52 playing cards, think of the “average” card. Let’s agree that we will count an Ace as 1, 2 as 2, … 10 as 10, Jack as 11, Queen as 12, and King as 13. The deck will be

our population. The mean of this population is. The table

below shows how to calculate the mean for our grouped data:

x f xf x(f/n) 1 4 4 0. 2 4 8 0. 3 4 12 0. 4 4 16 0. 5 4 20 0. 6 4 24 0. 7 4 28 0. 8 4 32 0. 9 4 36 0. 10 4 40 0. 11 4 44 0. 12 4 48 0. 13 4 52 1 sum 364 7

If you don’t like the table, just think about where your population distribution would “balance”

In this simple situation we know the average, so we are in a position to experiment a little. Then, when we want to estimate the average blood pressure for “smokers” we will know how to work.

First Experiment

Make sure your deck is well shuffled. This means shuffling 6 or 7 times. This simulates drawing from

a population “randomly”. In order to get a feel for why we sample randomly, suppose you want to

know the average height for men at SUNYIT. Since you are at the gym anyway, why not just ask

those basketball players to be your sample?

Once your deck is well shuffled, deal out 3 cards and note the numerical values (1 through 13). Place

your cards back in the deck and shuffle again 6 or 7 times. Deal out another 3 cards and calculate the

average. Repeat this 10 times and fill in the table below.

trial First Card Second Card Third Card Sum Average 1 2 3 4 5 6 7 8 9

I did this experiment myself and obtained the following:

trial First Card Second Card

Third Card

Sum Average

Here’s the same histogram for a sample size of 10. Compare the two histograms. What is the same? What is different?

Here’s the framework for most of the rest of the course: We will want to know the mean, or the

standard deviation, or the proportion of a population but, lacking the time and money to obtain data

from everyone in the population we will have to work with a sample. Thus while we want the

population mean or the population standard deviation or the population proportion we will

have to work with the sample mean or the sample standard deviation or the sample proportion

instead and we will have to figure out how to do this intelligently.

(^02 3 4 5 6 7 8 9 10 11 )

1

2

3

3.5 x 10

5

Sampling Distribution of Sample Proportions : Suppose you would like to know what the

proportion of lime Skittles is (the green ones). This is exactly the same (mathematically) as polling to determine what proportion of low income families in your school district receive free school lunches.

Here’s an experiment you can perform and share with us for a class presentation.

1. Buy one of the large bags of “snack pack” skittles (a large bag containing many smaller sized _bags like you would pass out to kids on Halloween).

Open each little bag and count the number of green skittles and also the total number of_ _skittles.
Complete a table like the following and present a histogram of sample proportions._

Bag Number Number of Green Skittles Total Number in Bag Sample Proportion 1 2 3 4 5 6 7 8

etc

40 or so

Now we need some numerical results. Let’s concentrate on sample means first. We can show the following:

If you are conducting a simple random sample from a population then when we look at the histogram of sample means we will find that it is centered right over the population mean. That is, the mean of the “sampling distribution of the sample means” is the same as the population mean itself. We say that we have an “unbiased estimator”. Notationally, we use the symbol for the mean (that’s the mu part) of the sample means (that’s the xbar part). Note that for homework, you may wish to write “xbar” rather than learn to format the symbol. Choose whichever you prefer. Our first result then is
If you are conducting a simple random sample from a finite population (like a deck of cards) then when we look at the histogram of sample means we will find that it is not as spread out as the original population. In fact, we already know that we prefer large samples to small (we somehow feel that we have “more information”). You can show that the spread of the sampling

The Central Limit Theorem

Here’s a little numerical experiment to show (again) why the Central Limit Theorem is the “600 pound gorilla” of statistics. Suppose you have a simple population which is uniformly distributed between 0 and 1. If you’ve played with

the rand() function in your spreadsheet you see that it kicks out numbers between 0 and 1 for you. You might want to check this with a histogram. I’ve taken a population which is uniformly distributed and sampled with sample sizes 1 through 6 and presented histograms of sample means. In each graph I superimpose the normal distribution. Notice by

the time sample sizes (your “n” values) get even modestly large (say 5 or so) the distribution of sample means starts to look fairly mound shaped and symmetrical.

0 0.5 1

0

1

Sample Size =

0 0.5 1

0

1

2

3

Sample Size =

0 0.5 1

0

1

2

3

Sample Size =

0 0.5 1

0

1

2

3

Sample Size =

0 0.5 1

0

1

2

3

4

Sample Size =

0 0.5 1

0

1

2

3

4

Sample Size =

So, what can we do with this? Right now we just make predictions about sampling procedures. In the next lecture we see how to sample intelligently and say useful things about our samples. For now, try to answer the example questions. Assume in each case that both the population and the distribution of sample means are normally distributed.

Scores of students on the ACT college entrance exam in a recent year had the normal distribution with mean and standard deviation. What is the probability that a single student randomly chosen from all those taking the test scores 21 or higher? What is the probability that the mean score for 36 students randomly selected from all those who took the test nationally is 21 or higher?

a. Since our population is normally distributed, we can get the probability that a data point will lie above 21 easily. If we know we have a normal distribution, and if we know the mean and the standard deviation, we can look up probabilities or areas as usual. Convert to z and use the table. Obtain

Now our table likes to see z values rounded to the nearest hundredth, so look up the area for z=0.41 and get A=0.6591. Our table gives areas to the left or below. We want an area to the right so get prob=1 - 0.6591 = 0.3409.

b. Let’s think this part through. If our population has a mean of 18.6 and a standard deviation of 5.9, then when we sample from this population with a sample size of we will have a sampling distribution of the means with center and standard deviation of sample means of These numbers are what we’ll use to convert to z and look up in our table.

Look up the area for z=2.44 and get A=0.9927. Our table gives areas to the left or below. We want an area to the right so get prob=1 - 0.9927= 0.0073.

So what’s the point of all this? Evidently it’s not at all uncommon for an individual to score 21 or above (it happens roughly 34% of the time). For a collection of 36 randomly selected individuals to have a same mean at or above 21 is relatively rare (occurring less than 1% of the time).

And here’s another possible presentation: A bottling company uses a filling machine to fill plastic bottles with a popular cola. The bottles are supposed to contain 300 milliliters (ml). In fact, the contents vary according to a normal distribution with mean μ = 304 ml and a standard deviation of 2 ml. What is the probability that an individual bottle contains less than 301 ml? What is the probability that the mean contents of the bottles in a six pack is less than 301 ml? What is the probability that the mean contents of the bottles in a case (24 bottles) is less than 301 ml?

Understanding the Concept of Sampling Distributions and Mean in Statistics, Study notes of Data Analysis & Statistical Methods

Related documents

Partial preview of the text

Download Understanding the Concept of Sampling Distributions and Mean in Statistics and more Study notes Data Analysis & Statistical Methods in PDF only on Docsity!

Sampling Distributions Text Sections 7.1 and 7.

Sampling Distribution of Sample Proportions : Suppose you would like to know what the

The Central Limit Theorem