Probability of Sample Statistics Close to Population Parameters, Study notes of Statistics

The concept of sampling distributions, focusing on the sampling distribution of a proportion and the sampling distribution of the mean. It explains how to calculate the probability that a sample statistic is close to the population parameter using the binomial distribution and normal approximation. The document also covers the continuity correction and the normal approximation to the binomial distribution.

Typology: Study notes

Pre 2010

Uploaded on 09/02/2009

koofers-user-tun
koofers-user-tun 🇺🇸

10 documents

1 / 5

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Sampling distributions
Cécile Ané
Stat 371
Spring 2006
Outline
1Introduction
2Sampling distribution of a proportion
3Sampling distribution of the mean
4Normal approximation to the binomial
5The continuity correction
Sampling distributions
What does it mean to take a sample of size n?
Y1,...,Ynform a random sample if they are independent and
have a common distribution.
From a sample, we can calculate a sample statistic such
as the sample mean ¯
Y.
¯
Yis random too! It can differ from sample to sample. The
textbook refers to a meta-experiment.
The distribution of ¯
Yis called a sampling distribution.
Sampling distribution of a proportion
Example: cross of two heterozygotes Aa ×Aa. Probability
distribution of the offspring’s genotype:
Offspring genotype
AA Aa aa
0.25 0.50 0.25
An offspring is dominant if it has genotype AA or Aa.
Experiment: Get n=2 offsprings, count the number Yof
dominant offspring, and calculate the sample proportion
ˆ
p=Y/2.
We would like ˆ
pto be close to the “true” value p=0.75
ˆ
pis random
Distribution of ˆ
p(from the binomial distribution):
Y012
ˆ
p0.0 0.5 1.0
IP 0.0625 0.3750 0.5625
pf3
pf4
pf5

Partial preview of the text

Download Probability of Sample Statistics Close to Population Parameters and more Study notes Statistics in PDF only on Docsity!

Sampling distributions

Cécile Ané

Stat 371

Spring 2006

Outline

1

Introduction

2

Sampling distribution of a proportion

3

Sampling distribution of the mean

4

Normal approximation to the binomial

5

The continuity correction

Sampling distributions

What does it mean to take a sample of size

n

Y

1

Y

n

form a random sample if they are independent and

have a common distribution.

From a sample, we can calculate a sample statistic suchas the sample mean

Y

Y

is random too! It can differ from sample to sample. The

textbook refers to a

meta-experiment

The distribution of

Y

is called a sampling distribution.

Sampling distribution of a proportion

Example: cross of two heterozygotes

Aa

×

Aa

. Probability

distribution of the offspring’s genotype:

Offspring genotype

AA

Aa

aa

An offspring is dominant if it has genotype

AA

or

Aa

Experiment: Get

n

2 offsprings, count the number

Y

of

dominant offspring, and calculate the sample proportion

p

Y

We would like

ˆ p

to be close to the “true” value

p

p

is random

Distribution of

ˆ p

(from the binomial distribution):

Y

p

IP

Sampling distribution of a proportion

Larger sample size:

Y

of dominant offspring out of

n

p

Y

20 the sample proportion.

We still want

ˆ p

to be close to the “true” value

p

p

is still random

What is the probability that

ˆ p

is within 0

05 of

p

? Translate

into a binomial question IP

p

IP

Y

IP

Y

IP

Y

IP

Y

IP

Y

Sample size of 20 better than sample size of 2 !!

Sampling distribution of the mean

Example: weight of seeds of some variety of beans.Sample size

n

Student #

Observations

sample mean

y

¯ y

¯ y

¯ y

Y

is random. How do we know its distribution?

We will see 3 key facts.

Key fact # 1

If

Y

1

Y

n

is a random sample, and if the

Y

i

’s have mean

μ

and standard deviation

σ

, then

Y

has mean

μ

¯ Y

μ

and variance var

Y

σ

2

n

, i.e. standard deviation

σ

¯ Y

σ

n

Seed weight example:

Assume beans have mean

μ

500 mg

and

σ

120 mg. In a sample of size

n

4, the sample mean

Y

has mean

μ

¯ Y

500 mg and standard deviation

σ

¯ Y

60 mg.

Key fact # 2

If

Y

1

Y

n

is a random sample, and if the

Y

i

’s are all from

N

μ, σ

, then

Y

∼ N

μ,

σ

n

Actually,

Y

1

Y

n

n

Y

∼ N

too.

Seed weight example: 100 students do the same experiment. 350

n=

n=

sample mean

The normal approximation to the binomial

Example:

X

of children with side effects after a vaccine, out

of

n

200 children. Probability of side effect:

p

  1. So

X

∼ B

What is

IP

X

Direct calculation:

IP

X

IP

X

IP

X

200

C

0

0

200

200

C

15

15

185

Heavy!Or we can use a trick: the binomial might be close to anormal distribution. Pretend

X

is normally distributed!

0

2

4

6

8

10

0.5 0.4 0.3 0.2 0.1 0.

n= 10 , p= 0.

Probability

0

2

4

6

8

10

0.25 0.20 0.15 0.10 0.05 0.

n= 50 , p= 0.

0

5

10

15

20

0.12 0.10 0.08 0.06 0.04 0.02 0.

n= 200 , p= 0.

0

5

10

15

20

0.25 0.20 0.15 0.10 0.05 0.

n= 20 , p= 0.

Probability

0

5

10

15

20

0.15 0.10 0.

n= 20 , p= 0.

Some Possible Values

0

5

10

15

0.25 0.20 0.15 0.10 0.05 0.

n= 20 , p= 0.

The normal approximation to the binomial

X

Y

1

Y

200

where

Y

1

if child #1 has side effects,

otherwise.

Y

200

if child #200 has side effects,

otherwise.

Apply key result #3: if

n

(# of children) is large enough,

then

Y

1

Y

n

has a normal distribution.

Use the normal distribution with

X

’s mean and variance:

μ

np

σ

np

p

If

X

∼ B

n

p

and if

n

is large enough:

if

np

and

n

p

(rule of thumb), then

X

’s distribution is approximately N

np

np

p

The normal approximation to the binomial

Back to our question:

IP

X

np

10 and

n

p

190 are both

5, so

X

≈ N

IP

X

IP

X

IP

Z

True value: > sum( dbinom(0:15,

size=200,

prob=0.05))

[1] 0.

The continuity correction

0

5

10

15

20

of children with side effect

0

5

10

15

20

The continuity correction

X

binomial

B

, and

Y

normal

N

No continuity correction:

IP

X

IP

Y

IP

Y

IP

Z

The continuity correction gives a better approximation.

IP

X

IP

Y

IP

Y

IP

Z

(true value was 0.9556)

The continuity correction

X

binomial

B

, and

Y

normal

N

What is the probability that between 8 and 15 children get sideeffects?

IP

X

IP

X

IP

Y

IP

Z

IP

Z

IP

Z

True value: > sum(

dbinom(8:15, size=200, prob=0.05) )

[1]