









Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Statistical inference is the process of drawing conclusions from data, for example by confidence intervals and significance tests. In this lecture we shall ...
Typology: Schemes and Mind Maps
1 / 17
This page cannot be seen from the preview
Don't miss anything!










Statistical inference is the process of drawing conclusions from data, for example by confidence intervals and significance tests. In this lecture we shall look how we can draw conclusions from samples about the means of populations.
We shall first look at large samples, and at how we can make inferences about a single mean, means in paired data, and the difference between the means of two samples. For each of these we shall use a large sample Normal method or z method.
We shall then look at the same problems for small samples. For a single mean we shall describe the one sample t method, for paired data the paired t method, and for the means of two samples the two sample t method, also called the independent samples t method, or two group t method. For t methods there are strong assumptions about the distribution of the observations. I shall describe how we can use graphical methods to investigate these.
We shall not discuss what to do if we have means of more than two samples. The usual method for any size samples is one-way analysis of variance (anova), the assumptions of which are as for the two sample t method.
We can find confidence intervals and carry out significance tests for the means of large samples using the Normal distribution. We make use of two properties of large samples. First, the means of large samples drawn in the same way will follow a Normal distribution quite closely, as described in Week 2. Second, the standard deviation estimated from a large sample will be close to that for the whole population. This means that the standard error estimated from the sample will be a good estimate.
We find confidence intervals for means of large samples using the Normal distribution. We first estimate the standard error of the mean of the sample. This is easy to do from the standard deviation of the observations, it is the standard deviation divided by the square root of the sample size. Then the 95% confidence interval is the mean minus 1.96 standard errors to the mean plus 1.96 standard errors.
For example, Figure 1 shows the distribution of birthweight in 1749 singleton pregnancies to Caucasian mothers in South London. This is clearly negatively skew, unlike the distribution of birthweight for term births, which is approximately Normal. These birthweights have mean = 3296.0 g and standard deviation = 563.2 g. The standard error of the mean is 13.5 g. Because the sample is large, the mean birthweight will be from a Normal distribution with mean equal to the mean birthweight in the population and standard deviation very close to the estimated standard error of the mean, 13.5 g. Hence the 95% confidence interval for the population mean birthweight will be 3296.0 – 1.96 × 13.5 g to 3296.0 + 1.96 × 13.5 g, which gives 3270 g to 3322 g. Hence we estimate that the mean birthweight in this population to be between 3270 and 3322 g.
0
100
200
300
Frequency
0 1000 2000 3000 4000 5000 Birth weight (g)
Figure 1. Birthweights of 1749 singleton births to Caucasian mothers in South London (data of Brooke et al. , 1989)
Table 1. Baseline depression score and fall after six weeks by treatment group for 525 patients with depression (Christensen et al. , 2004)
Baseline scores Fall in scores Number Mean SD Mean SD BluePages 165 21.1 10.4 3.9 9. MoodGYM 182 21.8 10.5 4.2 9. Controls 178 21.6 11.1 1.0 8.
0
.
.
.
.
Probability density -6 -4 -2 0 2 4 6 t
1 degree of freedom:
0
.
.
.
.
Probability density -6 -4 -2 0 2 4 6 t
4 degrees of freedom:
0
.
.
.
.
Probability density -6 -4 -2 0 2 4 6 t
20 degrees of freedom:
0
.
.
.
.
Probability density -6 -4 -2 0 2 4 6 t
Standard Normal:
Figure 2. Student’s t distribution with 1, 4, and 20 degrees of freedom, with the Standard Normal distribution
The assumptions required for this method are:
We can check the last by plotting the difference against the average of the two measurements for the subject. I shall describe this in detail later under paired t test.
We can also find a confidence interval for difference between the means of two independent samples. For example, we shall compare the mean fall in score for BluePages with MoodGYM. The difference between the means, BluePages minus MoodGYM, = –0.3. We can find the standard error for the difference by squaring the standard error of each mean, adding, and taking the square root. This only works when the groups are independent. If we were to do it for paired data like the before and after measurements above, the standard error might be much too large. For BluePages and MoodGYM, we have
The 95% CI is then given by –0.3 – 1.96 × 0.98 to –0.3 + 1.96 × 0.98 = –2.2 to +1.6.
We can also do a test of the null hypothesis that in the population the difference between the means is zero against the alternative hypothesis that the difference in the population is not zero. As for the paired example above, because we have a large sample the observed difference minus the population difference then divided by the estimated standard error of the difference should be an observation from a Standard Normal distribution. If the null hypothesis were true, the population difference would be zero. The test statistic is observed difference divided by its standard error, z = –0.3/0.98 = –0.31. The probability of an observation from the Standard Normal distribution being as far from its expected value, zero, as –0.31 is P=0.76. Hence the difference is not significant. We can tell this from the 95% confidence interval, also, as this includes zero, the null hypothesis value for the difference. This is the large sample Normal distribution test or z test for the means of two independent groups.
We can carry out the same calculations for the comparison of each active intervention with control. For BluePages, the difference between mean falls is 3.9 – 1.0 = 2.9 and the standard error of the difference is 0.95. Hence the 95% confidence interval is 2.
If we want to test the overall null hypothesis that the three treatments would produce the same mean fall in the population, we could do this by applying the Bonferroni correction to these three P values. Multiplying by 3 would give the smallest P value = 0.0005 × 3 = 0.0015, which is still highly significant. Christensen et al. (2004) did not do the analysis exactly as we have here. They used an analysis of variance method, which I shall omit, to compare all three groups simultaneously.
The large sample Normal method for comparing two means requires two assumptions about the data.
Some computer programs do not do large sample z tests directly. You have to use the command for a one sample or paired t test, or for a two-sample t test with unequal variances. I describe these below. For large samples, they give the same answers as the z tests.
When samples are small, we cannot apply the large sample Normal distribution methods safely. This problem was tackled by a statistician who published under the pseudonym Student, because his employers would not allow him to publish the results of his work. The probability distribution which he discovered is known as Student’s t distribution as a result and the methods which use it as Student’s t tests.
We have seen that when the sample is large, the observed sample mean minus the population mean divided by the standard error follows the Standard Normal distribution. When the sample is small this is not so. The distribution followed depends on the distribution of the observations themselves, unlike the large sample case where this is irrelevant. We have to assume that the data themselves come from a population which follows a Normal distribution. We have seen that some naturally occurring variables do this and some do not. We shall see in Week 5 that many variables which do not follow a Normal distribution can be made to do so by changing the way in which we look at them, using a transformation such as the logarithm. When the observations come from a population which follows a Normal distribution, then the sample mean minus the population mean divided by the standard error of the mean follows Student’s t distribution , or simply the t distribution. Student’s t distribution may be defined as the distribution which this ratio would follow.
Like the Normal distribution, Student’s t distribution is a family of distributions rather than just one. This family has only has one parameter, the number which tells us with which member of the family of t distributions we are dealing. This is called the degrees of freedom. We have already used this term in the calculation of variances and standard deviations. The degrees of freedom of the t distribution is equal to the degrees of freedom of the standard deviation used in the calculation of the standard error.
Figure 2 shows some members of the Student’s t distribution family. When the degrees of freedom are small, corresponding to small samples, the t distribution has much longer tails than the Normal. This reflects the greater uncertainty in the standard error of the mean. As the degrees of freedom and hence the related sample size gets bigger, the t distribution gets closer and closer to the Standard Normal distribution. The t distribution reaches the Normal distribution in theory when the sample is infinitely large. In practice, it is difficult to tell the Normal and t distributions apart at about 30 degrees of freedom.
Like the Normal, the t distribution has no simple formulae for its probabilities. Instead we used numerical approximations to calculate the number which replaces 1.96 in confidence interval calculations and the P values in significance tests. If we do these calculations using one of the many computer programs available, the program will calculate these for us. For the purposes of illustration, I shall also give a short table of the distribution for different degrees of freedom (Table 2). For each of the degrees of freedom given, this gives the value which will be exceeded, in either positive or negative direction, with the given probability. For example, Figure 3 shows the 5% two sided probability points of the t distribution with 4 degrees of freedom.
We can use Student’s t distribution to replace the Normal distribution in confidence interval and significance tests for small samples. To do this we must be able to assume that the observations themselves come from a Normal distribution, plus other assumptions for different applications as described below.
We can use the t distribution to carry out all the analyses of means of small samples which we did above using the Normal distribution for large samples. We seldom want to estimate the mean of a population from the mean of a small sample, but we shall start with this as it is the easiest.
For our example, we shall use data from nine patients with chronic non-healing wounds (Shukla et al. , 2004). Biopsies were assessed using the microscopic angiogenesis grading system (MAGS) score, which provides an index of how well small blood vessels are developing and hence of epithelial regeneration. High scores are good. The nine observations were 20, 31, 34, 39, 43, 45, 49, 51, and 63.
We can use these measurements to estimate the mean MAGS score in non-healing patients. The mean score before treatment is 41.7 and the standard deviation is 12. with 8 degrees of freedom. The standard error of the mean is 4.2. If we had a large sample, we could estimate a 95% confidence interval for the mean by subtracting and adding 1.96 standard errors: 41.7 – 1.96×4.2 to 41.7 + 1.96×4.2. But we have only 9 observations, so this would not be valid. Instead we use the t distribution with 8 degrees of freedom. From Table 2, the 5% point of the t distribution with 8 degrees of freedom is 2.31, so the confidence interval for the mean MAGS score is 41.7 – 2.31×4.2 to 41.7 + 2.31×4.2 = 32.0 to 51.4.
This is only valid provided we can assume the observations come from a Normal distribution. We may know from our experience of the measurement that this variable usually follows a Normal distribution, but we always like to check that our sample is compatible. I describe how to do this in the next section.
When I introduced the Normal distribution, I showed histograms of several large samples and superimposed Normal distribution curves on them to show whether the Normal distribution fitted the data. For small samples, it is very difficult to judge from a histogram whether the Normal distribution is a good fit. Figure 4 shows a histogram for the MAGS score before treatment.
We cannot really say whether the distribution and the data have the same shape. There is a better graphical method to examine the fit of a Normal distribution to a set of data, the Normal quantile plot or Normal plot for short. A Normal plot is a plot of the observed data against the values which we would expect if the data actually followed a Normal distribution. Table 3 shows the results of the calculation. First we put our observations into ascending order. There are nine of them, and we ask what would be the expected values of the smallest observation from a sample of nine from a Normal distribution. For the Standard Normal distribution this is –1.28. (As usual, we skip the formulae because the computer program will do all this for us.) We expect the next up to be –0.84, the next to be –0.52, etc. The middle value is expected to be zero, the mean and median of the Standard Normal distribution. We now convert these to a Normal distribution with the same mean and variance as the data by multiplying the Standard Normal value by the sample standard deviation and adding the sample mean. Thus we would expect the smallest of nine observations from a Normal distribution with mean 41.7 and standard deviation to be –1.28×12.5 + 41. = 25.6. Compare this to the observed smallest value, which is 20. Inspecting Table 3 will show you that most of the observed MAGS scores and the MAGS scores we would expect if we had a Normal distribution are quite close.
We can now plot the observed MAGS score against the MAGS score which would be expected if data followed a Normal distribution. If the observed and expected are similar, observations should lie close to the line of equality, which joins points where the observed and expected would be equal, which we also draw on the graph. Figure 5 shows the Normal plot for the MAGS data. Most of the observations are indeed close to the line, suggesting that the observations are quite close to a what we would expect from a Normal distribution.
To see how Normal plots behave with distributions of different shapes, we can look at Normal plots and histograms together when we have larger samples. Figure 6 shows the Normal plot for the birth weight data of Figure 1. The distribution is negatively skew and the points deviate away from the line, falling below it at either end rising above in the middle. Figure 7 shows the histogram and Normal plot for term birth weights only, which fit a Normal distribution quite well. The Normal plot shows a good fit to the straight line of equality. Figure 8 shows the Normal plot for serum cholesterol in stroke patients. This is a positively skew distribution and shows the opposite curvature to the negatively skew distribution of Figure 6, curving upwards rather than down.
0
100
200
300
Frequency
1000 2000 3000 4000 5000 Birth weight (g)
1000
2000
3000
4000
5000
Birth weight (g)
1000 2000 3000 4000 5000 Inverse Normal
Figure 7. Normal plot for the birthweight data, births at or above 37 weeks gestation only
0
5
10
15
20
25
Frequency
2 3 4 5 6 7 8 9 1011 Serum cholestrol (mmol/L)
2
4
6
8
10
Serum cholesterol (mmol/L)
2 4 6 8 10 Inverse Normal
Figure 8 Normal plot for cholesterol in stroke patients
Table 4. MAGS score before and after treatment with topical placental extract in 9 patients with non-healing wounds (Shukla et al ., 2004)
MAGS score before
MAGS score after Difference, MAGS before minus MAGS after
Average of MAGS before and MAGS after
20 32 12 26. 31 47 16 39. 34 43 9 38. 39 43 4 41. 43 55 12 49. 45 52 7 48. 49 61 12 55. 51 55 4 53. 63 71 8 67.
There are several ways of drawing Normal plots. Some programs, such as SPSS, put the expected Normal values on the vertical axis and the observed data on the horizontal axis. A downward curve then indicates positive skewness, an upward curve negative skewness. Some programs use the Standard Normal expected values rather than those for a Normal distribution with the same mean and standard deviation as the data, in which case the straight line depends on the mean and standard deviation rather than being the line of equality. Some offer a Normal probability plot rather than a Normal quantile plot, but these look very similar and are interpreted in the same way.
There are also several significance tests, such as the Shapiro-Wilk, Shapiro-Francia, and the splendidly named Kolmogorov-Smirnov tests, which can be used to test the null hypothesis that the data come from a Normal distribution. Graphical methods are much more useful in practice. If the sample is small, departures from the Normal may not be significant just because the there are insufficient data to detect them. If the sample is large, very small departures from the Normal may be significant, but such departures will not affect the results of analyses.
The paired t method is the version of the one sample t method usually seen in research publications. Here we have paired observations, such as the same subject before and after an intervention, the same subject receiving two different interventions as in a cross-over trial, or matched case and control in a case-control study. Table 4 shows fuller data from Shukla et al. (2004). In this trial, patients with chronic non-healing wounds were randomised to receive topical placental extract or to control. The data in Table 4 show the MAGS score before and after treatment in a group 9 of the patients in the active treatment group. We want to know whether we have evidence that mean MAGS score changed and what the average score might be. I have calculated the difference between the MAGS score after treatment and the MAGS score before treatment, i.e. the increase in the MAGS score.
The authors of the paper did not do any further analysis of these data, as they were all positive differences and the MAGS score clearly increases following treatment. We shall use them to estimate the mean increase in MAGS score. The mean and standard deviation of the increase in MAGS score are 9.33 and 4.03 respectively. We have 9 observations so the number of degrees of freedom for the calculation of the standard deviation is 9 – 1 = 8.
The standard error of the mean difference is 1.34. To estimate the 95% confidence interval for the mean from this small sample, we use the 5% point of the t distribution with 8 degrees of freedom. From the 8 degrees of freedom row in Table 2 this is 2.31. The 95% confidence interval is therefore the mean minus or plus 2.31 standard errors, 9.33 – 2.31 × 1.34 to 9.33 + 2.31 × 1.34, which gives us 6.2 to 12.4.
We can also test the null hypothesis that in the population the mean increase is zero. The test statistic is the mean divided by its standard error. This is 9.33/1.34 = 6.96. If we look in the 8 degrees of freedom row in Table 2, we see that this is larger than the largest number there, 5.04, which corresponds to a probability of 0.001. Hence we could say P<0.001. In practice, we would do this using a computer program, which gives us P = 0.0001. The difference is highly significant.
There are several assumptions which we must make about the data for the paired t method test to be valid:
The first of these, independence, depends on the design. It is met for the MAGS data, because the pairs of data come from nine different subjects. The second can be tested by a Normal plot, as shown in Figure 9. This appears to fit the straight line quite well and there is no reason to suppose that the differences do not follow a Normal distribution. The third, that the mean and the variability are not related to the magnitude, can also be investigated graphically. We do a scatter plot of the difference against the average of the two observations, as in Figure 10. We do this because the average of the two measurements is the best estimate we have of the subject’s true MAGS score over the period. Using only one of the measurements, either before or after, on the horizontal axis tends to produce spurious relationships between difference and magnitude. For the MAGS data, Figure 10 shows little evidence that either the mean difference or the variability of the differences is related to the magnitude of MAGS score for the subject.
This is also called the unpaired t method or unpaired t test, the two group t method, or Student’s two sample t test. It enables us to estimate the difference between means or test the null hypothesis of no difference in the population, even when the samples are small.
Our example is a comparison of capillary density between patients with diabetic foot ulcers and a group of non-ulcerated controls (Table 5). The data are shown graphically in Figure 11. The samples are small, only 23 ulcer patients and 19 controls, so we cannot use the large sample Normal method. The standard error will not be sufficiently well estimated.
For the two-sample t method, we must make three assumptions about the data:
If the distributions in the two populations have the same variance, we need only one estimate of variance. We call this the common or pooled variance estimate. It is a weighted average of the two sample variances, weighted by the degrees of freedom. The degrees of freedom for this common variance estimate are the number of observations minus 2. We then use this common estimate of variance to estimate the standard error of the difference between the means.
Table 5. Capillary density (per mm^2 ) in the feet of ulcerated patients and a healthy control group (data supplied by Marc Lamah)
17.5 9. 27.5 11. 27.0 12. 29.5 18. 27.0 18. 29.0 18. 34.5 18. 31.0 20. 35.5 20. 33.5 22. 35.5 22. 34.0 22. 36.5 23. 38.0 23. 40.0 24. 39.5 26. 40.0 26. 40.0 27. 52.0 27.
Number 19 23 Mean 34.08 22. SD 7.29 7.
0
1
2
3
4
5
Frequency
0 20 40 60 Capillary density
Controls
0
2
4
6
8
Frequency
0 20 40 60 Capillary density
Ulcers
Figure 12. Histograms of capillary density in two groups of patients
0
5
10
15
Frequency
-20 -10 0 10 20 30 Residual
Figure 13. Distribution of residual capillary density, with corresponding Normal distribution curve
0
5
10
15
Frequency
-20 0 20 Residual
0
10
20
30
Residual
-20 0 20 Inverse Normal
Figure 14. Distribution of residual capillary density, with corresponding Normal distribution curve, and Normal plot.
If we cannot assume uniform variance, the effect is usually small if the two populations are from a Normal Distribution. However, unequal variance is often associated with skewness in the data. When distributions are positively skew, the variability usually increases with increasing mean. We shall see an example in Week
If distributions are Normal, we can use the Satterthwaite correction to the degrees of freedom, often called the two sample t method for unequal or unpooled variance.
If variances are unequal, we cannot estimate a common variance. Instead we use the large sample form of the standard error of the difference between means. We replace the t value for confidence intervals and significance tests by t with fewer degrees of freedom. The Satterthwaite degrees of freedom depend on the relative sizes of the variances. The larger variance dominates and if one is much larger than the other the degrees of freedom for that group are the only degrees of freedom.
For the capillary density example, the degrees of freedom = 40 (= 19 + 23 – 2). The unpooled standard error, found as for the comparison of two large sample means, is 6.91 capillaries/mm^2 , Satterthwaite's degrees of freedom = 38.56. This is almost unchanged because the variances here are almost the same. We round this down to 38 to use the t table. For this example, the t test for equal variances gives P<0.0001, unequal variances also gives P<0.0001. The Satterthwaite 95% confidence interval is 6.91 to 16.07 capillaries/mm^2 , compared to the 6.92 to 16.07 capillaries/mm^2 using the pooled variance method. It is very similar. This is because the two sample t method is very robust to small departures from its assumptions, especially when the groups are of similar size, as here.
N.B. Satterthwaite’s method is an approximation for use in unusual circumstances. The equal variance method is the standard t test.
Brooke OG, Anderson HR, Bland JM, Peacock JL, Stewart CM. (1989) Effects on birth weight of smoking, alcohol, caffeine, socioeconomic factors, and psychosocial stress. British Medical Journal , 298 , 795-801.
Christensen H, Griffiths KM, Jorm AF. (2004) Delivering interventions for depression by using the internet: randomised controlled trial. British Medical Journal 328 , 265-268.
Prentice AM, Black AE, Coward WA, Davies HL, Goldberg GR, Murgatroyd PR,
Shukla VK, Rasheed MA, Kumar M, Gupta SK, Pandey SS. (2004) A trial to determine the role of placental extract in the treatment of chronic non-healing wounds. Journal of Wound Care 13 , 177-9.
Steenmoorle P, Julina GN. (2004) Can laboratory investigation help us to decide when to discontinue larval therapy? Journal of Wound Care 13 , 38-40.