Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
A portion of a statistical textbook that introduces the concepts of confidence intervals and hypothesis testing for normal populations with known variance. It explains how to calculate confidence intervals for the population mean using the normal distribution and the concept of a critical value. It also discusses the meaning of interval estimation and its relationship to the probability of containing the true population mean in repeated sampling. An example using artificial data is provided to illustrate the concepts.
Typology: Exams
1 / 5
In Statistical Inference I we described how to estimate the mean and variance of a population, and the properties of those estimation procedures. In Statistical Inference II we introduce two more aspects of statistical inference: confidence intervals and hypothesis tests. In contrast to a point estimate of the population mean β, like b = 17.158, a confidence interval estimate is a range of values which may contain the true population mean. A confidence interval estimate contains information not only about the location of the population mean but also about the precision with which we estimate it. A hypothesis test is a statistical procedure for using data to check the compatibility of a conjecture about a population with the information contained in a sample of data. Continuing the example from Statistical Inference I , suppose airplane designers have been basing seat designs based on the assumption that the average hip width of U.S. passengers is 16 inches. Is the information contained in the random sample of 50 hip measurements compatible with this conjecture, or not? These are the issues we consider in Statistical Inference II.
have a random sample of size T from this population, Y Y 1 , 2 , !, YT. The least squares estimator of the
population mean is
1
T
This estimator has a normal distribution if the population is normal,
For the present, let us assume that the population variance σ^2 is known. This assumption is not likely to be true, but making it allows us to introduce the notion of confidence intervals with few complications. In
the next section we introduce methods for the case when σ^2 is unknown. We can create a standard normal random variable from (2.2) by subtracting the mean and dividing by the standard deviation,
= − β^ = − β σ σ (2.3)
The standard normal random variable Z has mean 0 and variance 1. That is, Z ~ N (^) ( 0,1). Let z (^) c be a
“critical value” for the standard normal distribution, such that α = .05 of the probability is in the tails of the distribution, with α/2 = .025 of the probability in each tail. From Table 1 at the end of UE/2 the value
of z (^) c = 1.96 when α = .05. This critical value is illustrated in Figure 1.
Figure 1 α = .05 critical values for the N ( 0,1)distribution
Thus P Z [ ≥ 1.96] = P Z [ ≤ −1.96 (^) ] = 0.025 (2.4)
and
P (^) [ −1.96 ≤ Z ≤ 1.96] = 1 − .05 = .95 (2.5)
Substitute (2.3) into (2.5) to obtain
P^ ^ −1.96 ≤ (^) σ^ b − β T ≤ 1.96 =.
Multiplying through the inequality inside the brackets by σ T yields
P ^ −1.96 σ T ≤ b − β ≤ 1.96 σ T =.95 (2.7)
Subtracting b from each of the terms inside the brackets gives
P ^ − b −1.96 σ T ≤ −β ≤ − b + 1.96 σ T =.95 (2.8)
Multiplying by −1 within the brackets reverses the direction of the inequalities giving
P b ^ − 1.96 σ T ≤ β ≤ b + 1.96 σ T =.95 (2.9)
In general,
P b ^ − z (^) c^ σ T^ ≤ β ≤ b + zc σ T = 1 − α
where z (^) c is the appropriate critical value for a given value of tail probability α. In (2.10) we have defined the interval estimator
b ± z c^ σ T (2.11)
Our choice of the phrase interval estimator is a careful one. The interval (2.11) defines a procedure that can be used for any sample of data. The interval endpoints are thus random variables. What (2.10) implies is that intervals constructed using (2.11), in repeated sampling from the population, have a 100(1−α)% chance of containing the population mean β.
In order to use the interval estimation procedure defined in (2.11) we must have data from a normal population with a known variance. To illustrate the computation, and the meaning of interval estimation, we will create a sample of data using a computer simulation. Statistical software programs contain random number generators. These are routines that create values from a given probability distribution.
Table 1 contains 30 values from a normal population with mean β = 10 and variance σ^2 = 10.
Table 1 30 values from N(10,10) 11.939 11.407 13. 10.706 12.157 7. 6.644 10.829 8. 13.187 12.368 9. 8.433 10.052 2. 9.210 5.036 5. 7.961 14.799 9. 14.921 10.478 11. 6.223 13.859 13. 10.123 12.355 10.
Table 2 contains the least squares estimates and the lower and upper interval estimate values based on 10 samples like the one in Table 1.
Table 2 Results from 10 samples of data Sample b lower bound upper bound 1 10.206 9.074 11. 2 9.828 8.696 10. 3 11.194 10.062 12. 4 8.822 7.690 9. 5 10.434 9.302 11. 6 8.855 7.723 9. 7 10.511 9.380 11. 8 9.212 8.080 10. 9 10.464 9.333 11. 10 10.142 9.010 11.
Table 2 illustrates the sampling variation of the least squares estimator b. The sample means vary from sample to sample. In this simulation, or Monte Carlo, experiment we know the true population mean, β =
10, and the estimates b are centered at that value. The width of the interval estimates is 1.96 σ T. Note
that while the point estimates b in Table 2 fall near the true value β = 10, not all of the interval estimates contain the true value. Intervals from samples 3, 4 & 6 do not contain the true value β = 10. However, in 10,000 simulated samples the average value of b = 10.004 and 0.9486% of intervals constructed using (2.11) contain the true parameter value β = 10. These numbers reveal what is, and what is not, true about interval estimates.
or just a few samples, is not what statistical sampling properties tell us. Sampling properties tell us what happens in many repeated experimental trials.