


Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Skew and kurtosis can be computed from the scores in populations in a manner very similar to the computation of the mean and variance. Therefore, we will take ...
Typology: Study notes
1 / 4
This page cannot be seen from the preview
Don't miss anything!



1
We mentioned in Chapter 1 that parameters are simply numbers that characterize the scores of populations. The mean (μ) and variance (σ^2 ) are two of the most important parameters associated with statistical analysis in psychology. In subsequent chapters, we will use statistics to estimate these important parameters. Although μ and σ^2 will be our primary concern, skew and kurtosis are parameters in the same way as μ and σ^2. Skew and kurtosis can be computed from the scores in populations in a manner very similar to the computation of the mean and variance. Therefore, we will take a moment to comment on how skew and kurtosis are computed in populations and samples.
At the beginning of Chapter 3, we noted that the mean and variance had very similar definitions. The mean of a population is the sum of all scores divided by N. Statisticians sometimes call this the first raw moment of the distribution. The variance in a population is the sum of squared deviations from the mean, divided by N. Statisticians call this the second central moment of a distribution. Central moments are computed by sub- tracting the mean from all scores and then raising these deviation scores to some power. (Raw moments do not subtract the mean from all scores.) The following expressions define the second, third, and forth central moments:
θ μ
θ μ
θ μ
2
2
3
3
4
4
y N y N y N
The symbol θ is pronounced theta. Therefore, θ 1 , θ 2 , θ 3 , and θ 4 represent the first, second, third, and fourth central moments of a distribution, respectively. All are moments computed in the same way, and they differ only in the power to which the differences between y and μ are raised. You may not be familiar with exponents other than 2 (squaring), but they are really not complicated, as shown in the following examples:
y y y y y y y y y y y y y y
1 2 3 4
So, the exponents simply tell us how many times to multiply a number by itself.
The second, third, and fourth central moments are related to skew and kurtosis. In a population, skew is defined as
skew = θ σ
3 3 (3.A3.1)
where σ is the population standard deviation (or the square root of the second central moment, θ 2 ). Skew, as defined in equation 3.A3.1, can take on positive and negative values. Symmetrical distributions (such as in Figures 3.3a and 3.4a) have zero skew. Distributions that are skewed to the right yield positive skew values, and those that are skewed to the left yield negative skew values. The right- skewed distribution in Figure 3.3 has a skew of about 3.6, and the left-skewed distribution has a skew of about - 3.6. Kurtosis in a population is defined as
kurtosis = θ σ
4 4.^ (3.A3.2)
Kurtosis, as defined in equation 3.A3.2, can take on only positive values. The larger the values, the more leptokurtic the distribution. However, when statisticians talk about kurtosis, they often mean excess kurtosis. A normal distribution has a kurtosis of 3, when defined using equation 3.A3.2. Statisticians define 3 as normal kurtosis. Excess kurtosis is the difference between kurtosis and normal kurtosis. Therefore, excess kurtosis is defined as follows:
excess kurtosis = − θ σ
4 4 3.^ (3.A3.3)
A normal distribution has an excess kurtosis of 0. The leptokurtic distribution in Figure 3.4b has an
2 Statistics for Research in Psychology
excess kurtosis of 3, and the platykurtic distribution in Figure 3.4c has an excess kurtosis of - 1. The flattest possible distribution (most platykurtic) is the uniform , or rectangular , distribution. A uniform distribution is one for which the densities are equal for all possible numbers between some minimum and maximum. For example, a uniform distribution may have min = 0 and max = 1, and all values between min and max are equally probable. Uniform distributions have excess kurtosis of - 1.2.
The definitions of skew and excess kurtosis in equa- tions 3.A3.1 and 3.A3.3 are parameters. These for- mulas should not be applied to samples to estimate the population parameters. Just as the definition of the sample variance differs from that of the population variance, the definitions of the sample skew and sample kurtosis differ from those of the population skew and kurtosis. In all cases, the differences in the formulas have to do with making the statistics good estimators of the parameters. Once again, I promise this will be explained in Chapter 5. The formula for skew for a sample is
skew = θ 3 3
2
s
n
( n −1)( n −2)
where n is sample size, s is the sample standard deviation, and θ̭ 3 is computed exactly like θ 3 but from the scores in a sample rather than scores in a population; i.e., θ̭ 3 = Σ( y - m )^3 / n. Equation 3.A3.4 looks horrible. (If I’d seen something like this in my first statistics course, I would have had an anxiety attack. Sorry.) Notice, however, that the black term in equation 3.A3. looks exactly like equation 3.A3.1, except that θ̭ 3 and s are computed from scores in the sample. That’s not so
bad. The blue term is called a correction factor, which we already discussed when talking about the sample variance. If you play around with the correction factor, you’ll notice that it gets closer to 1 as n (sample size) gets larger, because n - 1 and n - 2 get closer and closer to n , making ( n - 1)( n - 2) closer and closer to n^2. This
increases. In statistics, we like large samples! The formula for excess kurtosis for a sample is
excess kurtosis =
θ 4 4
2
2
s
n n n n n n n
( )( nn −3)
where n is sample size, s is the sample standard deviation, and θ̭ 4 is computed exactly like θ 4 but from the scores in a sample rather than scores in a population; i.e., θ̭ 4 = Σ( y
(d) θ 3 , (e) θ 4 , (f) skew, (g) kurtosis, and (h) excess kurtosis.
Answers
243 / 81 = 3. (h) Excess kurtosis = kurtosis - 3 = 3 - 3 = 0.
4 Statistics for Research in Psychology
that the maximum proportion of the distribution outside the interval is simply
1 k^2.^ (3.A4.2)
For our example with k = 2, we find that the maximum proportion of the distribution outside the interval μ ± k (σ) is
1 1 2
k
or 25%. Figure 3.A4.1 shows that half of this percentage (12.5%) is above μ + 2(σ) and the other half is below μ - 2(σ). This means that if a score is 2 standard deviations above the mean, then at most 12.5% of scores are higher than it. If a score is 2 standard deviations below the mean, then at most 12.5% of scores are lower than it. More generally, if a score is k standard deviations above the mean, then the maximum proportion of the distribution above it is
k k
If a score is k standard deviations below the mean, then the maximum proportion of the distribution below it is also 0.5/ k^2. For k = 2, we can see that
0 5 0 5 2
k
or 12.5%, as shown in Figure 3.A4.1. Chebyshev’s theorem tells us about the minimum proportion of a distribution within the interval μ ± k (σ) and the maximum proportion of a distribution outside the interval μ ± k (σ). Once again, Figure 3.A4.1 shows that for many distributions, the proportion outside the interval μ ± k (σ) will be much less than 1/ k^2. The value of this theorem is that it gives us some guidance about what constitutes extreme or unusual scores when we know only the mean and standard deviation of a distribution. However, if we do know something about the shape of the distribution, then our judgments about extreme or unusual scores can be much more precise. In Chapter 4, we will see exactly this. If we happen to know that scores were drawn from a normal distribution, we can make far more accurate statements about scores falling k standard deviations from the mean. This will have huge practical consequences that will carry through the rest of this book.
LEARNING CHECK 1
Let’s say we know the mean and standard deviation of a distribution but not its shape. Answer the following questions and round your answers to two decimal places.
Answers