Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
How to compare the means of two populations using statistical inference. It discusses the concept of the two-sample z statistic and its application in testing hypotheses and constructing confidence intervals. However, in practical scenarios, population standard deviations are unknown, and this document introduces the two-sample t statistic as an alternative for estimating them. The document also covers the degrees of freedom and the approximation of the t distribution with the standard normal distribution.
Typology: Exams
1 / 2
Fall 2004
Comparing Two Means
An important type of statistical inference concerns the comparison of two population means. For example, we may want to find out if boys are better than girls at mathematics. Here, the question we are interested in is not whether all boys are better than girls, but rather whether boys are better than girls on average. Assuming that proclivity toward mathematics can be tested using standardized tests, one may compare the average test scores of a sample of boys and a sample of girls to learn about the difference between the population mean scores of boys and girls.
Let μ 1 and μ 2 denote the mean of a variable x for the populations 1 and 2, respectively, and let σ 1 and σ 2 denote the respective standard deviations. Suppose we take independent random samples of size n 1 and n 2 from the two populations and calculate the two sample averages x 1 and x 2. Assuming that both populations are normally distributed, the
difference between the sample averages ( x 1 (^) − x 2 )will be normally distributed with mean
1 2
2 2 1 2 ( ) 1 2
x x (^) n n
This means that the standardized difference
1 2 1 2 2 2 1 2 1 2
( x x ) ( z
n n
will be N(0,1).This is the two-sample z statistic.
We can test hypotheses about the differences of the means by using this statistic. We can also construct confidence intervals in the usual way. For example, a level C confidence
2 2 1 2 1 2 1 2
( x x ) z * n n
However, in practice, we usually don’t know the values of the population standard deviations and we need to estimate them using the sample standard deviations. To construct confidence intervals and to test hypotheses in this situation, we can use the two- sample t statistic:
Fall 2004
1 2 1 2 2 2 1 2 1 2
( x x ) ( t s s n n
Although the two-sample t-statistic does not have a t distribution, its distribution can be approximated very well by the t(k) distribution with an approximation for the degrees of freedom k. In practice, choose k equal to the smaller of n 1 (^) − 1 and if you don’t
have access to statistical software. Of course, when the sample sizes are large, there will be no harm in using the standard normal distribution instead of the t distribution.
n 2 (^) − 1
In summary, construct a level C confidence interval using the formula
2 2 1 2 1 2 1 2
s s x x t n n
and calculate the P-value for a two-sided alternative when testing the null hypothesis
As long as the sample sizes and are equal and the two population distributions have
very similar shapes, the two-sample t procedures are quite robust to non-normality.
n 1 n 2