Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
A lecture note from dr. Levine's statistics 511 class at purdue university, fall 2006. It covers the topic of inferencing between two samples, specifically focusing on z tests and confidence intervals for the difference of two population means. How to calculate the natural estimator and standard deviation of the difference between two sample means, and derives the z distribution of the test statistic under the assumption of equal variances. It also discusses the rejection regions for upper-tailed, lower-tailed, and two-tailed tests, as well as the calculation of type ii error and the choice of sample size.
Typology: Study notes
1 / 34
Purdue University
Lecture 18: Inferences Based on Two Samples
Devore: Section 9.1-9.
Aug, 2006
Purdue University
z
Tests and Confidence Intervals for a Difference Between Two
Population Means
An example of such hypothesis would be
μ
1
μ
2
or
σ
1
σ
2
. It may also be appropriate to estimate
μ
1
μ
2
and
compute its
α
confidence interval
1 ,... , X
m
is a random sample from a population with
mean
μ
1
and variance
σ
(^12)
1 ,... , Y
n
is a random sample from a population with mean
μ
2
and variance
σ
(^22)
and
samples are independent of one another
Aug, 2006
Purdue University
The natural estimator of
μ
1
μ
2
is
. To standardize this
estimator, we need to find
and
.
μ
1
μ
2 , so
is an unbiased estimator
of
μ 1 − μ 2.
E The proof is elementary:
μ
1
μ
2
Aug, 2006
Purdue University
The standard deviation of
is
σ
¯
X
−
Y¯
σ 12
m
σ (^22)
n
The proof is also elementary:
σ
(^12)
m
σ
(^22)
n
The standard deviation is the root of the above expression
Aug, 2006
Purdue University
The Case of Normal Populations with Known Variances
As before, this assumption is a simplification.
Under this assumption,
− ( μ 1 − μ 2 )
σ (^12)
m
σ (^22)
n
(1)
has a standard normal distribution
The null hypothesis
μ
1
μ
2
is a special case of the more
general
μ
1
μ
2
0
. Replacing
μ
1
μ
2
in (1) with
0
gives us a test statistic.
Aug, 2006
Purdue University
H a : μ 1 − μ 2 > ∆ 0
has the rejection region
z
z
α
H a : μ 1 − μ 2 < ∆ 0
has the rejection region
z
z
α
H a : μ 1 − μ 2 6
0
has the rejection region
z
z
α/
2
or
z
z
α/
2 .
Aug, 2006
Purdue University
Example
Consider Ex. 9.1 in Devore. Sample sizes are
m
and
n
. Note that
m
n
...it is not important now but will be
later...
exploratory data analysisNote that the normality suggestion is based on some
The hypotheses are
H 0 : μ 1 − μ 2
and
H a : μ 1 − μ 2 6
The test statistic is
z
¯x
¯y
σ (^12)
m
σ (^22)
n
Aug, 2006
Purdue University
For a level of significance
α
,
z
α/
2
z
. 005
and
the rejection regions is
z
or
z
.
The computed value of
z
-statistic is
which is well within
the rejection region. The
(^) -value for this rejection region is
which mean rejection at
any reasonable
level.
Aug, 2006
Purdue University
Fall 2006
Type II Error and the Choice of the Sample Size
H Consider the case of an upper-tailed alternative hypothesis
a : μ 1 − μ 2 > ∆ 0.
The rejection region is
¯x
¯y
≥ ∆ 0 + z α σ ¯
X
−
(^) Y¯
. Therefore,
Type II Error
∆ 0 + z α σ ¯
X
−
Y¯
when
μ
1 −
μ
2
′ )
Since
is normally distributed under the alternative
μ
1
μ
2
′
with mean
′
and standard deviation
σ
¯
X
−
Y¯
σ (^12)
m
σ 22
n
, we have
β
′ ) = Φ
( z α − ∆ ′ − ∆ 0
σ
Aug, 2006
Purdue University
alternatives. In particular, ifSimilar results can be easily obtained for the other two possible
H a : μ 1 − μ 2 < ∆ 0
, we have
β
′ ) = 1
− Φ ( − z α − ∆ ′ − ∆ 0
σ
If
μ 1 − μ 2 6
0 , the probability of Type II Error is
z
α/
σ ) − Φ ( − z
α/
σ
Aug, 2006
Purdue University
Example
probability of detecting a differenceConsider Example 9.3 from Devore. Suppose that the
between the two means
should be
. Can the
level test with
m
and
n
support this?
β For a two-sample test we have
β Because the rejection region is symmetric, we have
β
, and, therefore, the probability of detecting a
difference of
is
β
.
We can conclude that slightly larger sample sizes are needed.
Aug, 2006
Purdue University
P To determine a sample size that satisfies
Type II Error when
μ
1
μ
2
′ ) =
β
we need to solve
σ
(^12)
m
σ
(^22)
n
( z α + z β
2
For two equal sample sizes this yields
m = n = ( σ
(^12)
σ
2 2 (^) )(
z
α
z
β (^) )
2
Aug, 2006
Purdue University
Large-Sample Tests
unnecessary and variancesIn this case, the assumption of normality for the data is
σ
1 2 (^) ,
σ
(^22)
need not be known
This is because for large
n
the variable
− ( μ 1 − μ 2 )
S
(^12)
m
S
(^22)
n
is approximately standard normal
Aug, 2006
Purdue University
Then, if the null hypothesis is
μ
1
μ
2
0 , the test statistic
0
S
(^12)
m
S
(^22)
n
is approximately standard normal under the null hypothesis
This test is usually appropriate if both
m >
and
n >
Aug, 2006
Purdue University
Example
main competitor. If a study showed that a sample ofA company claims that its light bulbs are superior to those of its
n
1
of
its bulbs has a mean lifetime of
hours of continuous use
with a standard deviation of
hours , while a sample of
n
2
bulbs made by its main competitor had a mean lifetime
of
hours of continuous use with a standard deviation of
hours, does this substantiate the claim at the
level of
significance?
Aug, 2006
Purdue University
and
H a : μ 1 − μ 2 > 0
Reject
0
if
Calculations:
z
27
2
40
31
2
40
Decision:
0
cannot be rejected at
α
; the
p
-value is
Aug, 2006
Purdue University
Confidence intervals for
μ
1
μ
2
Since the test statistic
that we just described is exactly
normal when
σ
1 2
and
σ
(^22)
are known,
z
α/
2
Y − ( μ 1 − μ 2 )
σ 12
m
σ 22
n
< z
α/
2
(^) α
The
α
CI is easy to derive from this probability
statement; it is
¯x
¯y
z
α/
2 σ
¯
X
−
Y¯
where
σ
¯
X
−
Y¯
is a square root expression.
Aug, 2006
Purdue University
If both
m
and
n
are large, CLT implies that the normality
assumption is not necessary and substitution of
s
i 2
for
σ
i 2
,
i
will produce an
approximately
α
CI
More precisely, such an interval is
¯x
¯y
z
α/
2 √
s
1 2
m
s
2 2
n
Again, this result should be used only if both
m
and
n
exceed
Note that this CI has a standard form of
θˆ
z
α/
2 σ
θˆ
Aug, 2006
Purdue University
Example
An experiment was conducted in which two types of engines,
and
, were compared. Gas mileage, in miles per gallon, was
measured.
experiments were conducted using engine type
and
were done for engine type
. The gasoline used and
engineother conditions were held constant. The average mileage for
was
mpg and the average for machine
was
mpg. Find an approximate
CI on
μ
B
μ
A
, where
μ
A
and
μ
B
are population mean gas mileage for machines
and
,
respectively. Sample standard deviation are
and
for
machines
and
, respectively.
Aug, 2006
Purdue University
The point estimate of
μ
B
μ
A
is
¯x
B
¯x
A
.
For
α
, we find the critical value
z
. 02
.
Thus, the confidence interval is
Aug, 2006
Purdue University
The Two-Sample t-test
Both populations are normal, so thatAssumptions:
1 ,... , X
m
is a random
sample from a normal distribution and so is
1 ,... , Y
n
.
constructing a normal probability plot of theThe plausibility of these assumptions can be judged by
x
i s and another of
the
y
i s.
Aug, 2006
Purdue University
standardized variableWhen the population distributions are both normal, the
Y − ( μ 1 − μ 2 )
S (^12)
m
S
22
n
has approximately
t
distribution with
ν
df
ν
can be estimated from data as
ν
s 12
m
s 22
n
2
( s 12 /m
) 2
m − 1 + ( s
22 /n
) 2
n −
1
ν
has to be rounded down to the nearest integer...why not up?
Aug, 2006
Purdue University
The
two-sample confidence interval for
μ
1
μ
2
with
confidence level
α
is
¯x
¯y
t α/
2 ,ν
s
12
m
s
22
n
described earlier. A one-sided confidence bound can also be calculated as
The two-sample
t -test for testing
H 0 : μ 1 − μ 2
0
is
conducted using the test statistic
t
¯x
¯y
0
s 12
m
s 22
n
Aug, 2006
Purdue University
Fall 2006
Alternative hypothesis
Rejection region for approximate level
α
test
H a : μ 1 − μ 2 > ∆ 0 t ≥ t
α,ν
H a : μ 1 − μ 2 < ∆ 0 t
t α,ν
H a : μ 1 − μ 2 6
0
either
t
t α/
2 ,ν
or
t
t α/
2 ,ν
one-sample testA P-value can be computed exactly as we did before for the
Aug, 2006
Purdue University
Example
illustrate it:Consider example 9.6 in Devore. The following table helps to
Fabric Type
Sample Size
Sample Mean
Sample Standard Deviation
Cotton
10
51.71
.79
Triacetate
10
126.14
3.59
Aug, 2006