




Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
An explanation of the concept of sampling distributions and mean in statistics. It includes examples using a deck of playing cards and the central limit theorem. The document also covers calculating means and standard deviations for sample proportions.
Typology: Study notes
1 / 8
This page cannot be seen from the preview
Don't miss anything!





STA100 Sampling Distributions Lecture 14
If you think of almost any interesting question you would like to answer about almost any real world situation you are immediately drawn to the notion of uncertainty. You might wonder whether taking niacin will raise your HDL cholesterol. Until you try you won’t really know, but you can consult the literature and find that, on average, “NIASPAN increases HDL cholesterol an average of 14% for men and 20%” (http://www.niaspan.com/About_Niaspan/index.asp). But do they really know this? Did they give Niaspan to all women in America and report the number? Did they give it to a sample? How large a sample? Did they use 10 people? Did they use 1000 people? How many should you use to get an estimate for the real average increase in HDL? Why is the average interesting?
To make some of these ideas a little bit more concrete, you need to get out a deck of playing cards. Please take a second and do this. As Euclid told King Ptolemy "there is no royal road to geometry" (i.e. no easy way to learn) and as you’ve undoubtedly found out by now there is none to statistics, either.
Now that you have your deck of 52 playing cards, think of the “average” card. Let’s agree that we will count an Ace as 1, 2 as 2, … 10 as 10, Jack as 11, Queen as 12, and King as 13. The deck will be
our population. The mean of this population is. The table
below shows how to calculate the mean for our grouped data:
x f xf x(f/n) 1 4 4 0. 2 4 8 0. 3 4 12 0. 4 4 16 0. 5 4 20 0. 6 4 24 0. 7 4 28 0. 8 4 32 0. 9 4 36 0. 10 4 40 0. 11 4 44 0. 12 4 48 0. 13 4 52 1 sum 364 7
If you don’t like the table, just think about where your population distribution would “balance”
In this simple situation we know the average, so we are in a position to experiment a little. Then, when we want to estimate the average blood pressure for “smokers” we will know how to work.
First Experiment
Make sure your deck is well shuffled. This means shuffling 6 or 7 times. This simulates drawing from
a population “randomly”. In order to get a feel for why we sample randomly, suppose you want to
know the average height for men at SUNYIT. Since you are at the gym anyway, why not just ask
those basketball players to be your sample?
Once your deck is well shuffled, deal out 3 cards and note the numerical values (1 through 13). Place
your cards back in the deck and shuffle again 6 or 7 times. Deal out another 3 cards and calculate the
average. Repeat this 10 times and fill in the table below.
trial First Card Second Card Third Card Sum Average 1 2 3 4 5 6 7 8 9
I did this experiment myself and obtained the following:
trial First Card Second Card
Third Card
Sum Average
Here’s the same histogram for a sample size of 10. Compare the two histograms. What is the same? What is different?
Here’s the framework for most of the rest of the course: We will want to know the mean, or the
standard deviation, or the proportion of a population but, lacking the time and money to obtain data
from everyone in the population we will have to work with a sample. Thus while we want the
population mean or the population standard deviation or the population proportion we will
have to work with the sample mean or the sample standard deviation or the sample proportion
instead and we will have to figure out how to do this intelligently.
(^02 3 4 5 6 7 8 9 10 11 )
1
2
3
3.5 x 10
5
proportion of lime Skittles is (the green ones). This is exactly the same (mathematically) as polling to determine what proportion of low income families in your school district receive free school lunches.
Here’s an experiment you can perform and share with us for a class presentation.
1. Buy one of the large bags of “snack pack” skittles (a large bag containing many smaller sized _bags like you would pass out to kids on Halloween).
Bag Number Number of Green Skittles Total Number in Bag Sample Proportion 1 2 3 4 5 6 7 8
etc
40 or so
Now we need some numerical results. Let’s concentrate on sample means first. We can show the following:
Here’s a little numerical experiment to show (again) why the Central Limit Theorem is the “600 pound gorilla” of statistics. Suppose you have a simple population which is uniformly distributed between 0 and 1. If you’ve played with
the rand() function in your spreadsheet you see that it kicks out numbers between 0 and 1 for you. You might want to check this with a histogram. I’ve taken a population which is uniformly distributed and sampled with sample sizes 1 through 6 and presented histograms of sample means. In each graph I superimpose the normal distribution. Notice by
the time sample sizes (your “n” values) get even modestly large (say 5 or so) the distribution of sample means starts to look fairly mound shaped and symmetrical.
0 0.5 1
0
1
Sample Size =
0 0.5 1
0
1
2
3
Sample Size =
0 0.5 1
0
1
2
3
Sample Size =
0 0.5 1
0
1
2
3
Sample Size =
0 0.5 1
0
1
2
3
4
Sample Size =
0 0.5 1
0
1
2
3
4
Sample Size =
So, what can we do with this? Right now we just make predictions about sampling procedures. In the next lecture we see how to sample intelligently and say useful things about our samples. For now, try to answer the example questions. Assume in each case that both the population and the distribution of sample means are normally distributed.
a. Since our population is normally distributed, we can get the probability that a data point will lie above 21 easily. If we know we have a normal distribution, and if we know the mean and the standard deviation, we can look up probabilities or areas as usual. Convert to z and use the table. Obtain
Now our table likes to see z values rounded to the nearest hundredth, so look up the area for z=0.41 and get A=0.6591. Our table gives areas to the left or below. We want an area to the right so get prob=1 - 0.6591 = 0.3409.
b. Let’s think this part through. If our population has a mean of 18.6 and a standard deviation of 5.9, then when we sample from this population with a sample size of we will have a sampling distribution of the means with center and standard deviation of sample means of These numbers are what we’ll use to convert to z and look up in our table.
Look up the area for z=2.44 and get A=0.9927. Our table gives areas to the left or below. We want an area to the right so get prob=1 - 0.9927= 0.0073.
So what’s the point of all this? Evidently it’s not at all uncommon for an individual to score 21 or above (it happens roughly 34% of the time). For a collection of 36 randomly selected individuals to have a same mean at or above 21 is relatively rare (occurring less than 1% of the time).
And here’s another possible presentation: A bottling company uses a filling machine to fill plastic bottles with a popular cola. The bottles are supposed to contain 300 milliliters (ml). In fact, the contents vary according to a normal distribution with mean μ = 304 ml and a standard deviation of 2 ml. What is the probability that an individual bottle contains less than 301 ml? What is the probability that the mean contents of the bottles in a six pack is less than 301 ml? What is the probability that the mean contents of the bottles in a case (24 bottles) is less than 301 ml?