Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
An in-depth analysis of the methods used to compare two population means, including independent sampling, large and small sample cases, and paired differences. It covers hypothesis testing using both critical-value and p-value approaches, as well as constructing confidence intervals.
Typology: Study notes
1 / 23
1 2 : difference between two population means
p 1 p 2 : difference between two population proportions
2 (^1 ) 2
: ratio of two population variances
1. Two sample randomly and independently selected from two independent population, 2. n 1 (^) 30 and n 2 30
Sampling distribution of ( x 1^ ^ x 2 ) is approximately normal with:
Mean:^ ( x 1 (^) x (^) 2 ) ^ ^1 2
2 2 2 1 2 1 ( ) 1 2 1
x x
S S n n n n
(^)
2 2 2
(1- ) 100% Confidence interval for (^ 1 ^ 2 ) (difference of two population means):
1 2
2 2 1 2 1 2 2 ( ) (^1 ) 1 2 2 2 1 2 (^1 2 ) 1 2
( ) ( )
( )
x x Z (^) x x x x Z n n
S S x x Z n n
(^)
To investigate the effect of a new low-fat diet on weight loss, two random samples of 100 people each are selected. One group of 100 is placed on the low-fat diet, while the other group with regular diet. For each person, the amount of weight lost (or gained) in 3-week period is recorded.
Diet Weight loss Low-fat diet (1) 8, 21, 13, …………….., 10 (100 observations) Regular diet (2) 6, 14, 4, ………………, 8 (100 observations)
population mean weight losses for the two diets. Interpret the result.
Group Statistics
DIET N Mean Std. Deviation Std. Error Mean
LOWFAT n 1 =100 x 1 =9.31 S 1 =4.668. WTLOSS REGULAR n 2 =100 x 2 =7.40 S 2 =4.035.
(^2 2 2 ) 1 2 1 2 2 0. 1 2
4.668 4. ( ) (9.31 7.40) 100 100
1.91 1.96 0.62 1.91 1.22 (0.69, 3.13)
S S x x Z Z n n
(^)
Interpret: We are 95% confident that the difference between the mean weight loss of low-fat diet and regular diet is between 0.69 pounds and 3.13 pounds. Note: (^ 1 ^ 2 ) is at least 0.69 and at most 3.13 pounds, so we can infer^ 1 ^ 2.
If the confidence interval for ( 1 2 ) includes 0, it implies that there is no significant
difference between these two population means. If the confidence interval for ( 1 2 ) doesn’t include 0, it implies that there is significant difference between these two population means.
Example: A confidence interval for ( 1 2 ) is (-10, 4), what inference can we make?
A confidence interval for (^ 1 ^ 2 ) is (-18, -9), what inference can we make?
A confidence interval for ( 1 2 ) is (3, 12), what inference can we make?
Hypothesis test for (^ 1 ^ 2 ):
1 2 0 1 2 0 1 2 0
a :^ (
H D or D or D )
1 2 0 1 2 0 2 2 2 2 1 2 1 2 1 2 1 2
2
1 2 0 1 2 0 1 2 0
a :^ (
H D or D or D )
1 2 1 2 1 2 1 2
z (^ x^ x^ )^ D^ (^ x^ x^ )^ D^0 S S n n n n
p-value = p^ (^ z^ ^ z 0 ) when Ha : 0
p-value =^ p^ (^ z^ ^ z 0 ) when Ha : 0
Example1, DIETSTUDY To investigate the effect of a new low-fat diet on weight loss, two random samples of 100 people each are selected. One group of 100 is placed on the low-fat diet, while the other group with regular diet. For each person, the amount of weight lost (or gained) in 3-week period is recorded.
Diet Weight loss Low-fat diet (1) 8, 21, 13, …………….., 10 (100 observations) Regular diet (2) 6, 14, 4, ………………, 8 (100 observations)
low-fat diet is different from that of regular diet.
Sample information: (SPSS output) Step 1. H^ 0 :^ 1 ^ 2 ^0 H^ a :^ 1 ^ 2 0 ( different )
1 2 2 2 2 2 1 2 1 2
2
is different from that of regular diet.
p value 2 P z ( 3.09 ) 2(0.5 0.4990) 2 0.001 0.
*(SPSS output: p-value = 0.002 < 0.05)
low-fat diet is greater than that of regular diet.
Step 1. H^ 0 :^ 1 ^ 2 ^0 H^ a :^ 1 ^ 2 0 ( 1 greater than^^ 2 )
1 2 2 2 2 2 1 2 1 2
SPSS output for DIETSTUDY
Group Statistics
DIET N Mean Std. Deviation Std. Error Mean
LOWFAT 100 9.31 4.668. WTLOSS REGULAR 100 7.40^ 4.035^.
Independent Samples Test Levene's Test for Equality of Variances
t-test for Equality of Means
95% Confidence Interval of the F Sig. t df (^) Difference Sig. (2-tailed)
Mean Difference
Std. Error Difference Lower Upper Equal variances assumed
1.367 .244 3.095 198 .002 1.910 .617 .693 3.
WTLOSS (^) Equal variances not assumed
3.095 193.940 .002 1.910 .617 .693 3.
**1. The two samples are randomly and independently selected from the two target population. (sampling procedure)
4. Sample size is small ( n 1 (^) 30, n 2 30 ).
Since these two populations have equal variance, (
2 2 1 2 ), it is reasonable to use the information contained in both samples to construct a pooled variance estimator for use in confidence intervals and test statistics. 2 2 (^2 1 1 ) 1 2
2
(1- a^ )100% confidence interval for ( 1 2 ):
2 1 2 (^2 1 )
1 1 ( x x ) t S (^) p ( n n
(^) ) 2
t with df n 1 (^) n 2 2
1 2 0 1 2 0 1 2 0
a :^ (
H D or D or D )
1 2 0 2 1 2
2
Suppose we wish to compare a new method of teaching reading to “slow learners” to the current standard method. The response variable is the reading test score after 6 months. 22 slow learners are randomly selected, 10 are taught by the new method, 12 by the standard method. The test score is listed below.
New method (1) 80, 80, 79, 81, 76, 66, 71, 76, 70, 85 Standard method (2) 79, 62, 70, 68, 73, 76, 86, 73, 72, 68, 75, 66
a. Use a 95% confidence interval to estimate the true mean difference between the test score for the new method and the standard method. Interpret the interval.
2 2 2 2 (^2 1 1 2 ) 1 2
2 (
(
) t
1 2 ^ 0.025^12 2) (^2 1 )
( 1) ( 1) (10 1)5.835 1)6.
2 10 12 2
1 1 1 1 ( ) ( ) (76.4 72.33 37.457( ) 10 12
4.07 2.086 2.621 ( 1.396, 9.536)
p
p
n S n S S n n
x x t S n n
b. Conduct a test of hypothesis to determine whether the new method leads to a higher test
Step 1. H^ 0 :^ 1 ^ 2 ^0 H^ a :^ 1 ^ ^2 0 ( 1 higher than^^ 2 )
1 2 2 1 2
(10 12 2) t t t 0.05 1.
test score than standard method.
SPSS output for READING Group Statistics
METHOD N Mean Std. Deviation Std. Error Mean
NEW 10 76.40 5.835 1. SCORE STD 12 72.33 6.344 1.
Independent Samples Test
Levene's Test for Equality of Variances
t-test for Equality of Means
95% Confidence Interval of the F Sig. t df (^) Difference Sig. (2-tailed)
Mean Difference
Std. Error Difference Lower Upper Equal variances assumed
.002 .967 1.552 20 .136 4.067 2.620 -1.399 9.
SCORE Equal variances not assumed
1.564 19.769 .134 4.067 2.600 -1.360 9.
Conditions:
**1. two samples are randomly and independently selected from the two target population.
Procedure is on textbook P422-423.
Two sampling comparing:
method and standard method in reading:
**1. Randomly select 16 slow learners, 8 are assigned to new method, while the other 8 are assigned to the standard method. (independent sampling)
(two subjects in each pair with similar level, then assign treatments, to see the effect)
additive and no additive applied:
**1. Randomly choose 10 cars, doesn’t matter what brand they are, 5 are assigned use the new additive while the other 5 not; (independent sampling)
(two subjects in each pair with similar level, then assign treatments, to see the effect)
(Read textbook P432, 433)
Data layout: Pairs Sample1(New) Sample2(Old) (^) Difference ( xD ) 2 x D
1 18 16 18-16 = 2 4 2 31 28 31-28 = 3 9 3 25 26 25-26 = -1 1 4 23 21 23-21 = 2 4 5 26 22 26-22 = 4 16
xD^ ^10 xD^^2 ^34
(^2 )
D D D (^) D D D D D D
Note: The variable we are interested is paired difference xD.
1. A random sample of difference is selected from the target population of differences;
2 2
D D D D D D
0 0 0 0
D a D a D a D
D 0 D 0 D D D D
2
Example: To investigate which supermarket (A or B) has the lower prices in town, a agency randomly selected 100 items common to each of the two supermarkets and recorded the prices charged by each supermarket. The summary results are provided below.
A B D A B D
between supermarket A and supermarket B. Interpret the result.
2 0.
100
D D 0.10^ 1.96^ 0.003^ (0.09, 0.11) D
S x z z n
(^)
Interpret: We are 95% confident that the mean price difference between supermarket A and
b. Conduct a test of hypothesis to determine whether the mean price for supermarket B is
step1. H^ 0 :^ D ^0 H^ a :^ ^ D ^ ^ A ^ B ^0
step 2. test statistic:
0 0.10^0 0.10^ 33. 0.03 (^) 0. 100
D D D
x D z S n
(^)
step 4. since 33.3 > 1.645, reject H 0.
higher than market B.
**1. A random sample of difference is selected from the target population of differences;
2
D D D
S x t n
(^) t 2 with df nD 1
0 0 0 0
D a D a D a D
D 0 D D
2
Example 1, NEW PROTEIN DIET: To investigate a new protein diet on weight-loss, FDA randomly choose five individuals and record their weight (in pounds), then instruct them to follow the protein diet for three weeks. At the end of this period, their weights are recorded again. Person Weight before (1)
Weight after (2) (^) difference xD 2 x D
1 148 141 148-141 = 7 49 2 193 188 193-188 = 5 25 3 186 183 186-183 = 3 9 4 195 189 195-189 = 6 36 5 202 198 202-198 = 4 16
a. Calculate a 95% confidence interval for the difference between the mean weights before and after the diet is used. Interpret the interval.
(4)
(^2 )
D D D (^) D D D D D
2
D D D
Interpret: We are 95% confident that the difference of the mean weights before and after this diet will fall between 3.04 pounds and 6.96 pounds.
b. Do the data provide sufficient evidence that the protein diet has effect on the weight loss?
step1. H^ 0 :^ D ^0 H^ a :^ ^ D ^ 1 ^ 2 ^0
step 2. test statistic:
(^0 5 0) 7.
5
D D D
x D t S n
step 3. rejection region:
(5 1) t t (^) t 0.05 2.
step 4. since 7.07 > 2.132, reject H 0*. (p-value = 0.002/2 =0.001)
weight loss.
Paired Samples Statistics
Mean N Std. Deviation
Std. Error Mean Pair 1 W1 184.80 5 21.347 9. W2 179.80 5 22.354 9.
Paired Samples Test
Paired Differences t df
Sig. (2-tailed)
Mean Std. Deviation
Std. Error Mean
95% Confidence Interval of the Difference Lower Upper Pair 1 W1 - W2 5.000 1.581 .707 3.037 6.963 7.071 4.
To investigate the effect of a new teaching method on improving reading test score, 8 pairs slow learner are selected, not randomly, two learners in each pair with the similar reading IQs; in each pair, one use new method, the other one use standard method. Then after 6 months, the test scores are recorded. Pair New method (1) Standard method (2) xD
1 77 72 5 2 74 68 6 3 82 76 6 4 73 68 5 5 87 84 3 6 69 68 1 7 66 61 5 8 80 76 4
a. Construct a 95% confidence interval to estimate the difference of mean test scores between new method and standard method. Interpret the result.
(7)
x (^) D 4.375, S (^) D 1.685 ( SPSS output )
(8 1) 2 0.
4.375 4.375 2.365 0. 8
4.375 1.409 (2.966, 5.784)
D D D
S x t t n
Interpret: We are 95% confident that the mean difference of test score between new and standard methods will fall between 2.966 and 5.784 points.
b. Do the data provide sufficient evidence that the new method leads to higher test scores than
step1. H^ 0 :^ D ^0 H^ a :^ ^ D ^ 1 ^ 2 ^0
0 4.375^0 4.375^ 7. 1.685 (^) 0. 8
D D D
x D t S n
(8 1) t t t 0.05 1.
Paired Samples Statistics
Mean N Std. Deviation
Std. Error Mean Pair 1 NEW 76.00 8 6.928 2. STD 71.63 8 7.009 2.
Paired Samples Test
Paired Differences t df
Sig. (2-tailed)
Mean Std. Deviation
Std. Error Mean
95% Confidence Interval of the Difference Lower Upper Pair 1 NEW - STD 4.375 1.685 .596 2.966 5.784 7.344 7.
**1. The two samples are randomly and independently selected from the two target populations.
and
n 2
n p 1 ˆ 1 (^) 15, n q 1 ˆ 1 (^) 15, n 2 (^) p ˆ 2 15, n q 2 ˆ 2 15 .)
Under large sample size, by the Central Limit Theorem,
the sampling distribution of ( ˆ p 1 (^) p ˆ 2 ) is approximately normal with:
mean:^ ^ ( p ˆ 1 (^) p ˆ 2 ) ^ p 1^ p 2
standard deviation: (^) (ˆ 1 ˆ2)^1 1 2 1 2
p p
Large sample 100(1- a^ )% confidence interval for ( p 1 (^) p 2 ) :
1 2
1 1 2 2 1 2 2 ( ˆ^ ˆ) 1 2 2 1 2
1 1 2 2 1 2 2 1 2
( ˆ^ ˆ^ ) ( ˆ^ ˆ )
ˆ ˆ ˆ ˆ ( ˆ^ ˆ )
p p
p q p q p p z p p z n n
p q p q p p z n n
(^)
1 2 1 2 1 2
3. Test statistic:
1 2 1 2 1 2 1 2
( ˆ ˆ ) ˆ (^) , ˆ 1 1 1 ˆ ˆ( )
p p x x z where p n n pq n n
q p^ ˆ
4. Rejection region : 2 Z Z when H (^) a : p 1 (^) p 2 0
Z Z when H (^) a : p 1 (^) p 2 0
5. Conclusion.
Example: Smoking Survey, Suppose the American cancer Society randomly sampled 1500 adults in 1995 and then sampled 1750 adults in 2005 to do a smoking survey to determine whether there was evidence that the percentage of smokers had decreased.
1995 (1) 2005 (2) n 1 (^) 1500 n 2 (^) 1750
x 1 (^) 555 x 2 (^) 578
Define: p 1 : the true proportion of adult smokers in 1995
p 2 : the true proportion of adult smokers in 2005
a. Give a point estimate of ( p 1 (^) p 2 ).
1
x p n
2
x p n
1 2
x x p p n n
b. Do the data indicate that the proportion of adult smokers decreased over this 10-year
1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2
x n p n x n q n x n x n p n x n q n x n
Two sample sizes are large enough.
1 2
1 2 1 2 1 2
555 578 ( ˆ^ ˆ ) (^1500 )
1 1 1 1 ˆ ˆ( ) 0.349 0.651( ) 1500 1750
555 578 ˆ (^) 0.349, ˆ 1 ˆ 1 0.349 0. 1500 1750
p p z pq n n
x x where p q p n n
At 0.05, there is sufficient evidence to conclude that the proportion of adult smoker has
decreased over 1995 - 2005.
c. Form a 95% confidence interval for ( p 1 (^) p 2 ) to estimate the extent of the decrease.
Interpret it.
1 1 2 2 1 2 0. 1 2
ˆ ˆ ˆ ˆ ( ˆ^ ˆ )
0.37 (1 0.37) 0.33 (1 0.33) (0.37 0.33) 1. 1500 1750 0.04 1.96 0.0168 0.04 0.033 (0.007, 0.073)
p q p q p p z n n
Interpret: we are 95% confident that the proportion difference of adult smoker between 1995
and 2005 will fall between 0.007 and 0.073. (we can infer p 1 (^) p 2 .)
For some instance, we are interested in comparing two population variances.
The common statistical procedure for comparing population variances 12 and 22 ,
we use :
(^2 2 ) 0 1 2 1 2
(^2 2 ) (^1 2 1 ) 2
Test statistic:
2 1 2 2
s F s
How about the sampling distribution of
2 1 2 2
s s?
When 1. the two sampled populations are normally distributed.
2. the samples are randomly and independently selected from their respective populations.
Then, the sampling distribution of
2 (^1 ) 2
degrees of freedom and ( n 2 (^) 1 ) denominator degrees of freedom, respectively.
The properties of F-distribution:
**1. right-skewed.
denominator degrees of freedom.
Note: 1. Table VIII-XI, p799-806, give the upper-tail F-value. To accomplish this, we will always place the larger sample variance in the numerator of the F-test statistic.
2. We always define: 1 2 is the population variance associated with the larger sample
For example: F 0.05, 5, 8^ 3.69
2 2
2 2 2 2
Test statistic:
2 1 2 2 2 1 2 2
( )
s F s s s
Rejection region: F^ ^ F 2, ( n 1 (^) 1), ( n 2 1) when Ha :^ ^12 22
F F , ( n 1 (^) 1), ( n 2 1) when Ha : 1 2 22
Conclusion.
Since variance of supplier 1 is larger than that of supplier 2, let define
Test statistic:
Rejection region:
2 , ( 1 1), (^2 1)^ 0.10^2 , (13 1), (18 1) 0.05, (12), (17)
Since 4.24>2.38 , reject H. 0
We would advise the experimenter to purchase the mice from supplier 2 since they tend to be more homogeneous.