Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Statistical Inference: Confidence Intervals & Hypothesis Tests for Two Populations - Prof., Study notes of Data Analysis & Statistical Methods

An overview of statistical inference methods for determining the difference between two population proportions or variances. It covers the concepts of confidence intervals and hypothesis tests, including the expected values, formulas, and critical values for independent sampling. The document also includes examples of calculating confidence intervals and performing hypothesis tests using given data.

Typology: Study notes

Pre 2010

Uploaded on 09/17/2009

koofers-user-a6v-1
koofers-user-a6v-1 🇺🇸

10 documents

1 / 26

Toggle sidebar

Related documents


Partial preview of the text

Download Statistical Inference: Confidence Intervals & Hypothesis Tests for Two Populations - Prof. and more Study notes Data Analysis & Statistical Methods in PDF only on Docsity!

Chapter

- Part

B

Inferences

Based

on

Two

Samples:

Confidence

Intervals

and

Tests

of

Hypotheses

Identifying

the

Target

Parameter

p

p

Difference

between

proportionsproportions

,

percentages,

fractions

or

rates;

compare

proportions

Qualitative

Data

Two

groups

may

or

may

not

have

similar

proportions

regarding

particular

characteristics.

We

can

make

inference

about

p

1

and

p

2

by

examining

and

.

Methods

based

on

large

sample

sizes!

2

1

ˆ

ˆ

p

p

Comparing

Two

Population

Proportions: Independent Sampling

The

Sampling

Distribution

of

2

1

ˆ

ˆ

p

p

Proportions:

Independent

Sampling

The

expected

value

of

the

sampling

distribution

is:

) ( ) ˆ ˆ ( 2

1

2

1

p

p

p

p

E

=

If

the

two

samples

are

independent,

the

standard

deviation

of the difference of their proportions is:

)

(

)

(

2

1

2

1

p

p

p

p

of

the

difference

of

their

proportions

is:

2

2

2

1

1

1

2

2

2

1

1 1 ) ˆ ˆ (

ˆ ˆ ˆ ˆ 2 1

n

q

p

n

q

p

n

q

p

n

q

p

p

p

  • ≅ + = − σ

If

the

samples

are

large,

the

sampling

distribution

is

approximately

normal,

i.e.

2

1

2

1

pp

y

,

) , ( ~ ˆ ˆ

) ˆ ˆ ( 1 1 2 1 2

1

p p p p N p p

σ

Comparing

Two

Population

Proportions: Independent SamplingProportions:

Independent

Sampling

Large

Sample

100(

‐α

)%

Confidence

Interval

for

2

1

ˆ

ˆ

p

p

2 2 1 1 2 / 2 1 ˆ ˆ 2 / 2 1

) ˆ ˆ ( ) ˆ ˆ (

q p q p z p p z p p

p

p

+ ± − = ± −

α

α

σ

2

2

1

1

2 1 2 / 2 1 2 / 2 1

ˆ ˆ ˆ ˆ ) ˆ ˆ (

)

(

)

(

2

1

q p q p z p p

n n p p p p

p

p

+

±

α

α

2 1 2 / 2 1

)

(

n

n

z

p

p

+

±

α

Comparing

Two

Population

Proportions: Independent Sampling

One

Tailed Test

Two

Tailed Test

Proportions:

Independent

Sampling

Large

sample

test

of

hypothesis

about

p

1

p

2

One

Tailed

Test

H

0

:

(

p

1

p

2

)

=

D

0

H

a

:

(

p

1

p

2

)

>

D

0

(

<

D

0

)

Two

Tailed

Test

H

0

:

(

p

1

p

2

)

=

D

0

H

a

:

(

p

1

p

2

)

D

0

a

(

p

1

p

2

)

0

(

0

)

Rejection

region:

|

z

o

|>

z

α

a

(

p

1

p

2

)

0

Rejection

region:

|

z

o

|

> z

α

/

Test

Statistic:

0

2

1

)

ˆ

ˆ

(

o

D

p

p

z

=

⎞ ⎟⎟⎠

⎛ ⎜⎜⎝

=

) ˆ ˆ ( 2 1

1

1

ˆ

ˆ

ˆ

with

2

1

q

p

x

x

p

n

n

p

p

n

σ

Conditions:

The

two

samples

must

be

independent

of

each

other.

)

ˆ

ˆ (

2

1

p

p

o

σ

⎟⎠

⎜⎝

2 1 ) ( 2 1

2

1

n

n

n

n

n

n

p

p

n

The

sample

sizes

must

be

large,

i.e.

15

ˆ

and

15

ˆ

15

ˆ

and

15

ˆ

2

2

2

2

1

1

1

1

q

n

p

n

q

n

p

n

A

i

ti t d

id d t

l

t

th

i

l

f

it

t

th

li

ti

f

Example

A

scientist

d

ecided

t

o

evaluate

th

e

survival

of

mites

t

o

th

e

application

of

a

new

insecticide.

Because

it

is

known

that

males

and

females

can

have

different

reactions

to

this

chemical

individuals

of

each

sex

were

l

t d Sh

l

t d

h i di id

l

it

i t

ll

i

f

li d

selected.

Sh

e

l

ocated

each

i

ndividual

mite

i

nto

a

small

piece

of

rug,

applied

the

recommended

dose

of

insecticide

and

after

hour

the

status

of

the

insect

(alive/dead)

was

recorded.

The

summary

table

is

shown

below:

Alive

Dead

TOTAL

Male

20

30

50

Female

5

45

50

TOTAL

25

75

100

a)

Obtain

a

90%

confidence

interval

for

the

population

difference

in

proportion

of

dead

mites

between

females

and

males.

b)

Conduct

a

test

to

evaluate

if

there

is

significant

difference

in

the

proportion

of

dead

mites

between

females

and

males.

Use

α

=

0.10.

Example

Alive

Dead

TOTAL

Male

20

30

50

Female

5

45

50

TOTAL

25

75

100

a)

Obtain

a

90%

confidence

interval

for

the

population

difference

in

=

=

f

p

.

0

ˆ

α

proportion

of

dead

mites

between

females

and

males.

=

m

ˆ p

=

=

m

m

f

f

q

p

q

p

ˆ ˆ ˆ ˆ ˆ σ

=

=

m

f

p

p

n

n

m

f^

ˆ

ˆ

σ

=

±

m

f^

p

p

m

f

z

p

p

ˆ

ˆ

2 /

ˆ

)

ˆ

ˆ

(

σ

α

m

f

Example

Alive

Dead

TOTAL

Male

20

30

50

Female

5

45

50

TOTAL

25

75

100

a)

Obtain

a

90%

confidence

interval

for

the

population

difference

in

.

0

.

0

/

ˆ

=

=

=

f

p

α

proportion

of

dead

mites

between

females

and

males.

.

0

/

ˆ

=

=

m

p

0812

0

40 .

0

60 .

0

10 .

0

90

. 0 ˆ ˆ ˆ ˆ ˆ

= × + × = + =

m

m

f

f

q

p

q

p

σ

0812 .

0

50

50

ˆ

ˆ

= + = + = −

m

f

p

p

n

n

m

f

σ

.

0

.

)

.

0

. 0 ( ˆ ) ˆ ˆ (

ˆ

ˆ

2 /

× ± − = ± −

m

f^

p

p

m

f

z

p

p

σ

α

]

.

0

;

.

0

[

.

0

.

0

=

±

=

m

f^

p

p

f

Example

Alive

Dead

TOTAL

Male

20

30

50

Female

5

45

50

TOTAL

25

75

100

b)

Conduct

a

test

to

evaluate

if

there

is

significant

difference

in

the

ti

f d

d

it

b t

f

l

d

l

U

0 10

60 .

0

50

/

30

ˆ

10 .

0

90 .

0

50

/

45

ˆ

=

=

=

=

=

f

p p

α

H

0

:

p

f

=

p

m

H

a

:

p

roportion

of

d

ead

mites

b

etween

f

emales

and

males.

U

se

α

=

0

.10.

60 .

0

50

/

30

m

p

a

:

=

=

m

f

m

f

n

n

n

x

x

ˆ p

)

ˆ

ˆ

(

=

⎞⎟ ⎟⎠

⎛⎜ ⎜⎝

=

m f n n p p n n q p m f

1

1

ˆ

ˆ

ˆ

)

ˆ

ˆ (

σ

=

=

)

ˆ

ˆ ( ˆ

)

ˆ

ˆ

(

m

f^

p

p

m

f

o

p

p

z

σ

Example

Alive

Dead

TOTAL

Male

20

30

50

Female

5

45

50

TOTAL

25

75

100

b)

Conduct

a

test

to

evaluate

if

there

is

significant

difference

in

the

ti

f d

d

it

b t

f

l

d

l

U

0 10

60 .

0

50

/

30

ˆ

10 .

0

90 .

0

50

/

45

ˆ

=

=

=

=

=

f

p p

α

H

0

:

p

f

=

p

m

H

a

:

p

f

p

m

p

roportion

of

d

ead

mites

b

etween

f

emales

and

males.

U

se

α

=

0

.10.

60 .

0

50

/

30

m

p

a

:

p

f

p

m

75 . 0

100

45

30

ˆ

=

=

=

m

f

m

f

n

n

n

x

x

p

)

ˆ

ˆ

(

0866 . 0

50

1

50

1

25 . 0

75

. 0 1 1 ˆ ˆ ˆ

)

ˆ

ˆ (

=

⎞⎟ ⎠

⎛⎜ ⎝

×

=

⎞⎟ ⎟⎠

⎛⎜ ⎜⎝

=

m f n n p p n n q p m f σ

464 . 3

0866 . 0

60 .

0

90 . 0

ˆ

)

ˆ

ˆ

(

)

ˆ

ˆ (

=

=

=

m

f^

p

p

m

f

o

p

p

z

σ

3.464 = |

z

o

|

> z

α

/

=

Hence, we reject

H

0

.

Determining

the

Sample

Size

Calculations

are

different

for

two

samples

as

we

have

two

sources

of

variability

!

With

a

g

iven

level

of

confidence,

and

a

specified

sampling

error

(
E
)
,

g

p

p

g

(
)

it

is

possible

to

calculate

the

required

sample

size.

Typically,

n

1

=

n

2

=

n

Sample

size

needed

to

estimate

(

μ

1

μ

2

)

Based

on:

)

,

(

~

2 1 2 1 2 1 x

x

N

x

x

σ

μ

μ

Given

α

and

the

sampling

error

(
SE
)

required:

(^22)

2 1

2

2 /

)

(

)

(

z

σ

σ

α

(^22)

2 1

2

2

2 /

)

(

)

(

2

1

t

n

n

σ

σ

α

Estimates of

σ

1

2

and

σ

2

2

will be needed.

2

2 1 2 / 2 1

)

(

)

(

SE

z

n

n

σ

σ

α

=

=

2

2

1

2 /

2

1

)

(

)

(

SE

t

n

n

σ

σ

α

=

=

Estimates

of

σ

1

and

σ

2

will

be

needed.

For

small

sample

sizes,

we

use

t

α

/2,

n1+n

2

and

an

iterative

process.

A

i

d

d

i

h

ff

d i

l

f

Example

An

experiment

was

d

one

to

d

etermine

the

effect

on

d

airy

cattle

of

a

diet

supplement.

Two

groups

were

studied:

standard

diet

and

liquid

whey

diet

.

Suppose

that

we

test

the

hypothesis

of

difference

in

mean

hay

consumption.

Determine

the

approximate

number

of

cattle

that

should

be

included

in

each

g

roup

if

we

want

a

margin

error

of

at

most

±0.

for

a

95%

g

p

g

confidence

interval.

Previous

experimentation

has

shown

σ

to

be

approximately

0.8. 50

0
.
0
±
=
=

α^ SE

.
0
.
0

2

1

=
=
±

σ

σ

SE σ

Example

0
0
±

α

SE
.
0
.
0
.
0

2

1

=
=
±
=
=

σ

σ

σ

α

SE
)
(
)
(

(^22)

2 1

2

2 /

+

z

σ

σ

α

.
)
.
0
.
0
(
.
)
(
)
(

2

2

2

2

2

1

2

2 1 2 / 2 1

=
+
=
=
+
=
=
=

n

n

SE

z

n

n

n

σ

σ

α

0
.

(^2)

2

1

If

we

do

this

experiment

we

might

want

to

use

a

t

distribution

with

df = n

  • n
2 = 38

hence

t

= 2 024

and

df

=

n

1

+

n

2

– 2
=
,

hence

t

0.025,

38

=
.

and 21

97 .

20

0 50

)

80 .

0

80 .

0 (

024 .

2

2

2

2

2

2

1

≅ = + = = =

n

n

n

t

0.025,

40

=

0

.

2

How

about

the

power of

the

test?

This

needs

to

be

checked

too!

(later)

Determining

the

Sample

Size

Sample

size

needed

to

estimate

(

p

1

p

2

)

Based

on:

) , ( ~ ˆ ˆ

) ˆ ˆ ( 1 1 2 1 2

1

p p p p N p p

σ

Given

α

and

the

sampling

error

(
E
)

required:

)

,

(

) ( 1 1 2 1 2

1

p p p p p p

2

2

2 2 1 1 2 2 / 2 1

)

(

)

(

SE

q p q p z n n

+

=

=

α

Estimates

of

p

1

and

p

2

will

be

needed!

Use

conservative

values

of:

p

1

=

p

2

=
0.

For small sample sizes we do not use t but we need the general

For

small

sample

sizes

,

we

do

not

use

t

,

but

we

need

the

general

condition:

1

ˆ

d

1

ˆ

15

ˆ

and

15

ˆ

1

1

1

1

q

n

p

n

1

5

ˆ

and

15

ˆ

2

2

2

2

q

n

p

n

Identifying

the

Target

Parameter

σ

/

σ

Ratio

of

variances;

difference

in

variability

or

spread;

compare

variation

Quantitative

Data

Since

variances

do

not

follow

a

normal

distribution

(due

to

the squaring involved) we use a different technique forthe

squaring

involved)

we

use

a

different

technique

for

inferences

about

variances.

Comparing

Two

Population

Variances:

Independent Sampling

Goal:

compare

variances

between

populations

2

Independent

Sampling

Parameter:

(equal

to

when

variances

are

equal)

Estimator:

(observed ratio of sample variances)

(^2122)

σ σ

2 1 s

Estimator:

(observed

ratio

of

sample

variances)

Distribution

of

(a

multiple)

of

this

estimator

(under

the

assumption

h

h

l d

)

(^22) 1 s

that

we

h

ave

normal

d

ata)

is:

df

and

df

with

~

(^22)

2 1

2 1

2 1

=

=

=

n n F s s s

σ

Special case: If

then

df

df

~

2 1

2

2

=

=

=

n n F s σ σ

df

and

df

with

2

2

1

1

(^22)

2 1

(^22)

(^22)

=

=

=

n

n

F

s

σ

σ

σ

Special

case:

If

then

df

df

~

2

2

1

1

(^22)

2

1

=

=

=

n n F s σ σ

Comparing

Two

Population

Variances:

Independent Sampling

If

the

ratio

of

sample

variances

gets

too

far

from

(in

either

direction) the likelihood that both samples share the same

Independent

Sampling

direction)

the

likelihood

that

both

samples

share

the

same

population

variance

drops.

Properties

of

the

F

distribution

Take

on

positive

density

over

the

range (

)

range

(

,

)

Cannot

take

on

negative

values.

Non

symmetric

(skewed

right).

Indexed by two degrees ofIndexed

by

two

degrees

of

freedom

(df

1

and

df

2

).

Comparing

Two

Population

Variances:

Independent SamplingIndependent

Sampling

(

‐α

)100%

Confidence

Interval

for

(^22)

2 1

/

σ

σ

Ob

i

i

f

l

i

F

2

/

2

(
/
)

2

Ob

tain

ratio

of

sample

variances

F

o

=

s

1

2

/

s

2

2

=
(

s

1

/

s

2

)

2

.

(for

convenience

the

larger

sample

variance

is

put

in

the

numerator)

For

a

given

α

obtain:

F
F

(l

/

i

th

t il)

F

U

=
F

a

/2,

n

1,

n

1

(leaves

α

/

i

n

th

e

upper

t

ail)

F

L

=
F

1

a

/2,

n

1,

n

1

=
1/
F

a

/2,

n

1,

n

1

(leaves

α/

in

the

lower

tail)

Compute (

α

)%
CI:

Compute

(

α

)%
CI:

⎤ ⎥ ⎦

⎡ ⎢ ⎣

U

L

F

s s

F

s s

2 2 1

2 2 1

,

Conclude

population

variances

are unequal if interval does not

s

s

2

2

are

unequal

if

interval

does

not

contain

Example

(Part

I)

C

ffi i l

d

b

h

l

h

f i

i

l

Company

officials

are

concerned

about

the

l

ength

of

time

a

particular

drug

retains

its

potency.

A

random

sample

(

sample

)

of

bottles

of

the

product

is

drawn

from

the

current

production

and

analyzed

for

(

)

potency.

A

second

sample

(

sample

)

of

bottles

is

obtained,

stored

for

year,

and

then

analyzed.

The

summary

readings

are:

.
0
.

2

2 2 2

=
= =

s

n^ x

σ

.
0
.

1

1 1 1

=
= =

s

n^ x

σ

.
0

2

2

s

σ

.
0

1

1

s

σ

Obtain

a

95%

confidence

interval

for

the

ratio

of

the

population

variances,

and

use

it

to

test

its

equality.

Example

.
0
.

2

2 2 2

=
= =

s

n^ x

σ

.
0
.

1

1 1 1

=
= =

s

n^ x

σ

2

2

1

1

=
=

(^222) 1

0

s s

F
F

U

=
F

a

/2, n1-1, n2-

=
F

L

=
F

1-

a

/2,

n1-1, n2-

=

2

2

=

⎤ ⎥ ⎦

⎡ ⎢ ⎣

U

L

F

s s

F

s s

(^222) 1

(^2122)

,

F

distribution

Critical values of theCritical

values

of

the

F

distribution

(

α

=

0.025)

Example

.
0
.

2

2 2 2

=
= =

s

n^ x

σ

.
0
.

1

1 1 1

=
= =

s

n^ x

σ

2

2

1

1

.
.
0
.
0

2 2

(^2122)

0

=
=
=

s s

F
F

U

=
F

a

/2, n1-1, n2-1

=

F

0.025, 49, 49

= 1.762

F

L

=
F

1-

a

/2,

n1-1, n2-1

= 1/

F

0.025, 49, 49

= 1/ 1.762 = 0.567

2

2

[

]

[

]

133 .

3 ,

009 .

1

762 .

1

778 .

1 ,

567 .

0

778 .

1

,

(^222) 1

(^2122)

=

×

×

=

⎤ ⎥ ⎦

⎡ ⎢ ⎣

U

L

F

s s

F

s s

H

:

σ

2

/

σ

2

= 1

H

:

σ

2

=

σ

2

H

0

:

σ

1

/

σ

2

= 1

H

a

:

σ

1

2

/

σ

2

2

H

0

:

σ

1

=

σ

2

H

a

:

σ

1

2

σ

2

2

Because 1.0 is not contained in our 95% confidence interval

then we reject

H

0

!

Example

data

drug;

data

drug;

input

Sample

$

Resp

@@;

datalines; 1

10.74

2

9.50

1

10.03

2

10.11

1

10.68

2

9.31

proc

ttest

data=drug

alpha=0.05;

class

Sample;

var

Resp;

run;

... 1

10.73

2

9.87

1

10.10

2

9.73

1

10.67

2

9.65

;

1.778

F

test

of

Hypotheses

for

Equal

One

Tailed

Test

Two

Tailed

Test

Population

Variances

[

]

(^22)

2 1

(^22)

2 1

(^22)

2 1

0

: :

σ

σ

σ

σ

σ

σ

>

= <

a

H H

(^22)

2 1

(^22)

2 1

0

: :

σ

σ

σ

σ

= ≠

a

H H

Test

statistic:

Test

statistic:

(^22)

2 1

2 2 2

0

:

σ

σ

<
=

a

H

if

s

F

(^2) max

s

F

(^22)

2 1

2 1 2

0

2

1

(^21)

0

:

σ

σ

>
=

a a

H

if

s s

F

f

s

max^2 min

0

s

F
=

Rejection

region:

Rejection

region:

2 s

α

F

F
>

0

2 /

0

α

F

F
>

(

p

values

can

also

be

calculated).

(for

convenience

the

larger

sample

variance

is

put

in

the

numerator)

Example

In 2007 the ten fast growing economies had an average GDP growthIn

,

the

ten

fast

growing

economies

had

an

average

GDP

growth

rate

of

8.69%

with

a

standard

deviation

of

1.70%.

The

ten

slow

growing

economies

had

an

average

GDP

growth

rate

of

2.29%

with

a

standard deviation of 0 58% Do slow

and fast

growing economies

standard

deviation

of

0

.58%.

Do

slow

and

fast

growing

economies

have

similar

variability

(

α

=

5%)?