Download Statistical Inference: Understanding Population Parameters through Sampling Distributions and more Study notes Statistics in PDF only on Docsity!
Stat 528 (Autumn 2008) Towards Statistical Inference
Reading: Sections 3.3, 4.
- Performing statistical inference
- Population distributions
- Sampling distributions
- Visualizing sampling distributions
- Bias, variance and mean square error
- The law of large numbers
- A first look at the central limit theorem
An example
- A question: What proportion of researchers at OSU use statistics in their research?
- this proportion is a parameter, p, of our population.
- We cannot interview all researchers at OSU!
- We collect a random sample of researchers at OSU.
- We ask them “Do you use statistics in your research?”.
- We calculate the proportion of people in the sample who use statistics - this proportion is a statistic, p̂.
- A parameter is a number used to describe a characteristic of the population, e.g., μ, σ, p.
- A statistic is a function of the sample of data, e.g., ¯x, s, p̂.
- We often use a statistic to estimate a parameter. In this case, the statistic is known as an estimator.
Tools for statistical inference
- Random sample
- A random sample consists of n independent draws from some population or n independent values produced by a chance experiment.
- Summary statistic
- We choose a summary statistic or a small collection of summary statistics to represent the data obtained in our experiment. The summary statistic is a random variable.
- Sampling distribution
- The sampling distribution of a statistic is its probability distribution. The distribution depends on features of the population. Probability calculations are used to derive the sampling distribution.
- Comparison
- We compare the observed statistic to its sampling distri- bution. If there is a clash between the observed statistic and the sampling distribution, we discard the assump- tions used to derive the sampling distribution; if not, we retain the assumptions.
- Hypotheses, hypothesis tests, p-values, Type I and Type II error rates, power, confidence intervals, etc. - Much more terminology and formalization of the problem yet to come.
Visualizing sampling distributions
- Want to know how a statistic behaves for different sam- ples from the population.
- Repeat a large number of times:
- Draw a sample of size n from the population.
- Calculate the statistic based on that sample.
- Summarize the observed values of the statistic in a histogram.
- This is gives an approximate view of the sampling distri- bution.
Toy example - a normal population
- Suppose our population of values for X is described by a N(10, 22 ) distribution.
0.00 4 6 8 10 12 14 16
N(10,2) population
X values
density of X
- Draw a SRS of size n = 2 from this population. x 1 = 12. 62151 , x 2 = 12. 77690
Calculate the sample mean, ¯x, for this sample. x¯ = 12. 69920
Record this value of ¯x.
Example 1 – The mean of samples from a N(10, 22 ) population
- Draw 1000 random samples of size n from a N(10, 22 ) popu- lation. For each sample calculate the sample mean.
6 8 10 12 14
0
50
100
150
200
250
mean(sample)
n = 2
6 8 10 12 14
0
50
100
150
200
mean(sample)
n = 5
6 8 10 12 14
0
50
100
150
mean(sample)
n = 20
6 8 10 12 14
0
50
100
150
200
250
mean(sample)
n = 50
Example 2 - U(0, 1) population
- Draw 1000 random samples of size n from a U(0, 1) popula- tion. For each sample calculate the sample mean.
(^0) 0.0 0.2 0.4 0.6 0.8 1.
50
100
150
mean(sample)
n = 2
(^0) 0.0 0.2 0.4 0.6 0.8 1.
50
100
150
mean(sample)
n = 5
(^0) 0.0 0.2 0.4 0.6 0.8 1.
50
100
150
200
250
mean(sample)
n = 20
(^0) 0.0 0.2 0.4 0.6 0.8 1.
50
100
150
200
mean(sample)
n = 50
Example 4 - coin flips
- Flip n biased coins and record the proportion of heads. Repeat this procedure 1000 times.
(^0) 0.0 0.2 0.4 0.
100
200
300
400
500
mean(sample)
n = 2
(^0) 0.0 0.2 0.4 0.
100
200
300
400
mean(sample)
n = 5
(^0) 0.0 0.2 0.4 0.
50
100
150
200
mean(sample)
n = 20
(^0) 0.0 0.2 0.4 0.
50
100
150
200
250
300
mean(sample)
n = 50
Features of the sampling distribution
- The sampling distribution of a statistic is often centered about the value of the population parameter estimated by the statistic.
- The bias of an estimator is the mean of its sampling distri- bution minus the estimand: - bias( θ̂ ) = μ (^) θ̂ − θ - An estimator with zero bias is called unbiased; in other cases, the estimator is called biased.
- The variance of an estimator is the variance of the sampling distribution - var( θ̂ )
- The mean squared error of an estimator is
- MSE( θ̂ ) = bias^2 ( θ̂ ) + var( θ̂ )
The Central Limit Theorem
- The central limit theorem describes this change in shape and spread of the sampling distribution as n changes.
- Reconsider the earlier examples of sampling distributions for x¯. - Normal population. Retains normal shape, compression of spread. - Uniform population. Moves toward normal shape, com- pression of spread. - Skewed population. Moves toward normal shape, com- pression of spread. - Biased coin. Moves toward normal shape, compression of spread.
- These changes hold for most sampling distributions of inter- est, although there are a few exceptions. Later, we’ll see where the square root of n behavior comes from.