Sampling Distributions: The t, χ2, and F Distributions in Statistics - Prof. Xi Huang, Exams of Data Analysis & Statistical Methods

This document from the university of south carolina explores the sampling distributions of the t, χ2, and f distributions in statistics. It explains how these distributions can be used to understand the behavior of sample means and variances when the sample size is small and the population is normal. The relationship between these distributions and provides examples of their use.

Typology: Exams

Pre 2010

Uploaded on 09/02/2009

koofers-user-286
koofers-user-286 🇺🇸

9 documents

1 / 6

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
STAT 515 - Chapter 4 Supplement
Brian Habing - University of South Carolina
Last Updated: October 2, 2000
S4 - More on Sampling Distributions: The t,χ2, and FDistributions
As we saw in Section 4.9, the normal distribution plays a pivotal roll in describing
how the sample mean ¯xwill behave when you have a random sample x1, x2,...xn.
Unfortunately the central limit theorem only applies when the sample size is large.
Additionally, it only tells us about the sampling distribution of the sample mean, and
not about the sampling distribution of the sample variance s2. These limitations can
be overcome if we can believe that the sample was taken from a population that was
normal to begin with. That is, if we apply the methods in Section 4.6 and verify the
data is normal, we can get the sampling distribution for ¯xwhen nis small, and can
also get the sampling distribution for s2.
S4.1 - ¯xand the Normal Distribution
A fact that is proved in STAT 512 is: if the random sample is drawn from a
population that follows a normal distribution, then Z=¯xµ
σ/nis exactly standard
normal. In other words, if the base population is already normal, the central limit
theorem result applies even when n= 1! The only difficulty in this is that we rarely,
if ever, know the value of the parameter σ. Because of this we can’t use this fact
directly.
S4.2 - s2and the χ2(chi-squared) Distribution
The χ2distribution can be defined as follows. If z1, z2,...z(n1) are independent
and each follows the standard normal distribution , then
X2=z2
1+z2
2+···z2
(n1)
pf3
pf4
pf5

Partial preview of the text

Download Sampling Distributions: The t, χ2, and F Distributions in Statistics - Prof. Xi Huang and more Exams Data Analysis & Statistical Methods in PDF only on Docsity!

STAT 515 - Chapter 4 Supplement

Brian Habing - University of South Carolina Last Updated: October 2, 2000

S4 - More on Sampling Distributions: The t, χ^2 , and F Distributions

As we saw in Section 4.9, the normal distribution plays a pivotal roll in describing how the sample mean ¯x will behave when you have a random sample x 1 , x 2 ,... xn. Unfortunately the central limit theorem only applies when the sample size is large. Additionally, it only tells us about the sampling distribution of the sample mean, and not about the sampling distribution of the sample variance s^2. These limitations can be overcome if we can believe that the sample was taken from a population that was normal to begin with. That is, if we apply the methods in Section 4.6 and verify the data is normal, we can get the sampling distribution for ¯x when n is small, and can also get the sampling distribution for s^2.

S4.1 - ¯x and the Normal Distribution

A fact that is proved in STAT 512 is: if the random sample is drawn from a population that follows a normal distribution, then Z = (^) σ/¯x−√μn is exactly standard normal. In other words, if the base population is already normal, the central limit theorem result applies even when n = 1! The only difficulty in this is that we rarely, if ever, know the value of the parameter σ. Because of this we can’t use this fact directly.

S4.2 - s^2 and the χ^2 (chi-squared) Distribution

The χ^2 distribution can be defined as follows. If z 1 , z 2 ,... z(n−1) are independent and each follows the standard normal distribution , then

X^2 = z 12 + z 22 + · · · z(^2 n−1)

follows the χ^2 distribution with (n − 1) degrees of freedom. The table for this dis- tribution (and a typical picture of it) can be found in TABLE XI on page 521. This distribution is skewed to the right, has mean (n − 1), variance 2(n − 1), and takes all values 0 and higher. (The normal on the other hand takes all positive and negative values.) The usefulness of this distribution becomes a bit clearer if we again consider the random sample x 1 , x 2 ,... xn from a normal distribution. Looking at the formula for s^2 :

s^2 =

∑n i=1(xi^ −^ ¯x)^2 n − 1

we can see that we are squaring a bunch of independent normal random variables (the xi) and summing them up. The only reason that this isn’t a χ^2 random variable is that they aren’t standard normal, and we are dividing by n-1. By multiplying both sides of the above equation by n − 1 and dividing by σ^2 we get the following:

(n − 1)s^2 σ^2 =

∑^ n i=

(xi^ σ− x¯)^2

If the ¯x on the right side of the equation were replaced by μ, then we would be summing up a bunch of z = xiσ− x¯ and it would be the sum of n standard normals, making a χ^2 random variable with n degrees of freedom. Because we are using ¯x instead of μ we lose one degree of freedom, and so:

χ^2 df =n− 1 = (n^ −^ 1)s

2 σ^2 (S1)

where the df in the subscript is the number of degrees of freedom.

in 1908 by chemist William Gosset at the Guinness brewery in Ireland. Because he didn’t want employees at other breweries to know that he found statistics useful, he published his results under the pseudonym Student. Hence, the distribution is often known as Student’s t distribution. A t-distribution is formed by dividing a standard normal by a χ^2 over its degrees of freedom, where the normal and the χ^2 are independent.

tdf =n− 1 = √ (^) χ 2 Z df n =−n 1 − 1

(S2)

At first this seems to be more than a little bit out of nowhere. A fact proved in STAT 714 sheds some light on why it is useful however. If the sample x 1 , x 2 ,... xn is independent, then its sample mean ¯x and sample variance s^2 are independent! If the sample comes from a normal distribution, S4.1 showed that ¯x is related to a standard normal distribution, and S4.2 showed that s^2 is related to a χ^2 distribution. Combining these previous results gives:

tdf =n− 1 = √^ χ 2 Z df n =−n 1 − 1

σ/x^ ¯−√μn √ (^) (n−1)s 2 n^ σ−^21

By cancelling the n − 1 terms in the denominator, applying the square root, multiply- ing both the numerator and denominator by one over the denominator, and cancelling, we get

tdf =n− 1 = σ/^ ¯x−√μn √ (^) s 2 σ^2

σ/^ ¯x−√μn σs^ =

σ/^ ¯x−√μnσs sσσs^ =

¯x − μ s/√n (S3)

Just as we could solve equation S1 to find out information about the population variance, we can solve equation S3 to find out information about the population mean, if the sample comes from a population that follows the normal distribution.

This usage is discussed more in Section 5.2. One useful fact about the t distribution is that it becomes very similar to the standard normal distribution as the sample size n increases. Many tables for the t distribution stop at 30 degrees of freedom and simply refer the user to a standard normal table. Our table IV continues on past 30, but does not give all the values. Notice that the values change very little from one row to the next after about row eighteen. If you go to the bottom row, all of the values should be recognizable from the normal table.

S4.4 - Two variances and the F-Distribution

The final sampling distribution we will be concerned with is the F-distribution. The F-distribution is defined by:

Fdfx=nx− 1 ,dfy =ny − 1 = n^ Xx−^ x^21 X y^2 ny − 1

(S4)

where X x^2 and X y^2 are independent χ^2 random variables with nx −1 and ny −1 degrees of freedom respectively. Because it is formed by using two χ^2 random variables, the F-distribution has two separate degrees of freedom, one for the numerator and one for the denominator. This makes the F tables even more complicated than the χ^2 or t tables. The formula for the F-distribution again looks out of nowhere, until you recognize that we could get this formula by comparing two variances. Say we have independent random samples from two populations, call them x 1 , x 2 ,... xnx and y 1 , y 2 ,... yny. We could then write:

Fdfx=nx− 1 ,dfy =ny − 1 =

(nx−σ 2 1)s^2 x nx^ x− 1 (ny −1)s^2 y σ^2 y ny − 1