Moments Skew and Kurtosis, Study notes of Statistics

Skew and kurtosis can be computed from the scores in populations in a manner very similar to the computation of the mean and variance. Therefore, we will take ...

Typology: Study notes

2022/2023

Uploaded on 03/01/2023

ammla
ammla 🇺🇸

4.5

(37)

274 documents

1 / 4

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1
APPENDIX 3.3: CALCULATING SKEW AND KURTOSIS
We mentioned in Chapter 1 that para meters are simply
numbers that characterize the scores of populations.
The mean (µ) and va riance 2) are two of the most
importa nt parameters associated with statistical
analysis in psychology. In subsequent chapters, we will
use statistics to estimate these important parameters.
Although µ and σ2 will be our primary concer n, skew
and kur tosis are parameters in the same way as µ
and σ2. Skew and kurtosis can be computed from the
scores in populations in a manner ver y similar to t he
computation of the mean and variance. Therefore,
we will take a moment to comment on how skew and
kurt osis are computed in populations and samples.
Moments
At the beginning of Chapter 3, we noted that the mean
and variance had very similar definitions. The mean
of a population is the sum of all scores d ivided by N.
Statisticians somet imes call this the first raw moment
of the distribution. T he variance in a population is the
sum of squared deviations from the mean, divided by
N. Statisticians call this the second central moment of
a distribution. Central moments are computed by sub-
tracting the mean from all scores and then raising these
deviation scores to some power. (Raw moments do
not subtract the mea n from all scores.) The following
expressions define the second, third, and forth central
moments:
θµ
θµ
θµ
2
2
3
3
4
4
=
=
=
()
()
()
.
y
N
y
N
y
N
The symbol θ is pronounced theta. Therefore, θ1, θ2,
θ3, and θ4 represent the first, second, third, and fourth
central moments of a distribution, respectively. All are
moments computed in the same way, and they differ
only in the power to which the differences between
y and µ are raised. You may not be familiar with
exponents other than 2 (squaring), but they are really
not complicated, as shown in the following examples:
yy
yyy
yyyy
yyyyy
1
2
3
4
=
=
=
=
*
**
***.
So, the exponents simply tell us how many times to
multiply a number by itself.
Skew and Kurtosis
The second, th ird, and fourth cent ral moments are related
to skew and kurtosis. In a population, skew is defined as
skew =
θ
σ
3
3(3.A3.1)
where σ is the population standard deviation (or the square
root of the second central moment, θ2). Skew, as defined
in equation 3.A3.1, can take on positive and negative
values. Symmetr ical distributions (such as in Figures 3.3a
and 3.4a) have zero skew. Distributions that a re skewed
to the right yield positive skew values, and those t hat are
skewed to the left yield negative skew values. The right-
skewed distribution in Figure 3.3 has a skew of about 3.6,
and the left-skewed distr ibution has a skew of about -3.6.
Kurtosis in a population is defined as
kurtosis=
θ
σ
4
4
.(3.A3.2)
Kurtosis, as defined in equation 3.A3.2, can take
on only positive values. The larger the values, the
more leptokur tic the distr ibution. However, when
statisticians t alk about kurtosis, t hey often mean excess
kurtosis. A norm al distribution has a k urtosis of 3, when
defined using equation 3.A3.2. Statisticians define 3
as normal kurtosis. Excess kurtosis is the difference
between kurtosis and normal kur tosis. Therefore,
excess kurtosis is defined as follows:
excess kurtosis=−
θ
σ
4
4
3. (3.A3.3)
A normal distribution has an excess k urtosis of 0.
The leptokur tic distr ibution in Figure 3.4b has an
pf3
pf4

Partial preview of the text

Download Moments Skew and Kurtosis and more Study notes Statistics in PDF only on Docsity!

1

APPENDIX 3.3: CALCULATING SKEW AND KURTOSIS

We mentioned in Chapter 1 that parameters are simply numbers that characterize the scores of populations. The mean (μ) and variance (σ^2 ) are two of the most important parameters associated with statistical analysis in psychology. In subsequent chapters, we will use statistics to estimate these important parameters. Although μ and σ^2 will be our primary concern, skew and kurtosis are parameters in the same way as μ and σ^2. Skew and kurtosis can be computed from the scores in populations in a manner very similar to the computation of the mean and variance. Therefore, we will take a moment to comment on how skew and kurtosis are computed in populations and samples.

Moments

At the beginning of Chapter 3, we noted that the mean and variance had very similar definitions. The mean of a population is the sum of all scores divided by N. Statisticians sometimes call this the first raw moment of the distribution. The variance in a population is the sum of squared deviations from the mean, divided by N. Statisticians call this the second central moment of a distribution. Central moments are computed by sub- tracting the mean from all scores and then raising these deviation scores to some power. (Raw moments do not subtract the mean from all scores.) The following expressions define the second, third, and forth central moments:

θ μ

θ μ

θ μ

2

2

3

3

4

4

y N y N y N

The symbol θ is pronounced theta. Therefore, θ 1 , θ 2 , θ 3 , and θ 4 represent the first, second, third, and fourth central moments of a distribution, respectively. All are moments computed in the same way, and they differ only in the power to which the differences between y and μ are raised. You may not be familiar with exponents other than 2 (squaring), but they are really not complicated, as shown in the following examples:

y y y y y y y y y y y y y y

1 2 3 4

So, the exponents simply tell us how many times to multiply a number by itself.

Skew and Kurtosis

The second, third, and fourth central moments are related to skew and kurtosis. In a population, skew is defined as

skew = θ σ

3 3 (3.A3.1)

where σ is the population standard deviation (or the square root of the second central moment, θ 2 ). Skew, as defined in equation 3.A3.1, can take on positive and negative values. Symmetrical distributions (such as in Figures 3.3a and 3.4a) have zero skew. Distributions that are skewed to the right yield positive skew values, and those that are skewed to the left yield negative skew values. The right- skewed distribution in Figure 3.3 has a skew of about 3.6, and the left-skewed distribution has a skew of about - 3.6. Kurtosis in a population is defined as

kurtosis = θ σ

4 4.^ (3.A3.2)

Kurtosis, as defined in equation 3.A3.2, can take on only positive values. The larger the values, the more leptokurtic the distribution. However, when statisticians talk about kurtosis, they often mean excess kurtosis. A normal distribution has a kurtosis of 3, when defined using equation 3.A3.2. Statisticians define 3 as normal kurtosis. Excess kurtosis is the difference between kurtosis and normal kurtosis. Therefore, excess kurtosis is defined as follows:

excess kurtosis = − θ σ

4 4 3.^ (3.A3.3)

A normal distribution has an excess kurtosis of 0. The leptokurtic distribution in Figure 3.4b has an

2 Statistics for Research in Psychology

excess kurtosis of 3, and the platykurtic distribution in Figure 3.4c has an excess kurtosis of - 1. The flattest possible distribution (most platykurtic) is the uniform , or rectangular , distribution. A uniform distribution is one for which the densities are equal for all possible numbers between some minimum and maximum. For example, a uniform distribution may have min = 0 and max = 1, and all values between min and max are equally probable. Uniform distributions have excess kurtosis of - 1.2.

Estimating Skew and Kurtosis

The definitions of skew and excess kurtosis in equa- tions 3.A3.1 and 3.A3.3 are parameters. These for- mulas should not be applied to samples to estimate the population parameters. Just as the definition of the sample variance differs from that of the population variance, the definitions of the sample skew and sample kurtosis differ from those of the population skew and kurtosis. In all cases, the differences in the formulas have to do with making the statistics good estimators of the parameters. Once again, I promise this will be explained in Chapter 5. The formula for skew for a sample is

skew = θ 3 3

2

s

n

( n −1)( n −2)

(3.A3.4)

where n is sample size, s is the sample standard deviation, and θ̭ 3 is computed exactly like θ 3 but from the scores in a sample rather than scores in a population; i.e., θ̭ 3 = Σ( y - m )^3 / n. Equation 3.A3.4 looks horrible. (If I’d seen something like this in my first statistics course, I would have had an anxiety attack. Sorry.) Notice, however, that the black term in equation 3.A3. looks exactly like equation 3.A3.1, except that θ̭ 3 and s are computed from scores in the sample. That’s not so

bad. The blue term is called a correction factor, which we already discussed when talking about the sample variance. If you play around with the correction factor, you’ll notice that it gets closer to 1 as n (sample size) gets larger, because n - 1 and n - 2 get closer and closer to n , making ( n - 1)( n - 2) closer and closer to n^2. This

means that θ̭ 3 / s^3 needs less correction as sample size

increases. In statistics, we like large samples! The formula for excess kurtosis for a sample is

excess kurtosis =

θ 4 4

2

2

s

n n n n n n n

( )( nn −3)

(3.A3.5)

where n is sample size, s is the sample standard deviation, and θ̭ 4 is computed exactly like θ 4 but from the scores in a sample rather than scores in a population; i.e., θ̭ 4 = Σ( y

  • m ) 4 / n. If equation 3.A3.4 looks horrible, then equation 3.A3.5 looks positively ghastly. But notice again that the black terms in equation 3.A3.5 look exactly like equation 3.A3.3, except that θ 4 and s are computed from scores in the sample. The correction factors in blue behave the same way as the correction factor in the sample variance and the sample skew. As sample size increases, the correction factors get closer to 1. We are rarely interested in estimating skew and kurtosis for their own sake. Rather, we use skew and kurtosis computed from a sample to assess the normality of the population from which the sample was drawn. When a sample is drawn from a normal population, then skew and excess kurtosis should be close to 0. Large departures from 0 (positive or negative) would suggest that our sample was not drawn from a normal population. Just how large a departure from 0 would be cause for concern is something we will discuss in later chapters.

LEARNING CHECK 1

  1. If y = {2, 2, 5, 5, 5, 11} is a small population of scores, calculate the following: (a) μ, (b) σ, (c) θ 2 ,

(d) θ 3 , (e) θ 4 , (f) skew, (g) kurtosis, and (h) excess kurtosis.

Answers

  1. (a) μ = 5. (b) σ = 3. (c) θ 2 = 9. (d) θ 3 = 27. (e) θ 4 = 243.

(f) Skew = θ 3 /σ^3 = 27 / 27 = 1. (g) Kurtosis = θ 4 /σ^4 =

243 / 81 = 3. (h) Excess kurtosis = kurtosis - 3 = 3 - 3 = 0.

4 Statistics for Research in Psychology

that the maximum proportion of the distribution outside the interval is simply

1 k^2.^ (3.A4.2)

For our example with k = 2, we find that the maximum proportion of the distribution outside the interval μ ± k (σ) is

1 1 2

k

or 25%. Figure 3.A4.1 shows that half of this percentage (12.5%) is above μ + 2(σ) and the other half is below μ - 2(σ). This means that if a score is 2 standard deviations above the mean, then at most 12.5% of scores are higher than it. If a score is 2 standard deviations below the mean, then at most 12.5% of scores are lower than it. More generally, if a score is k standard deviations above the mean, then the maximum proportion of the distribution above it is

k k

 =^ (3.A4.3)

If a score is k standard deviations below the mean, then the maximum proportion of the distribution below it is also 0.5/ k^2. For k = 2, we can see that

0 5 0 5 2

k

or 12.5%, as shown in Figure 3.A4.1. Chebyshev’s theorem tells us about the minimum proportion of a distribution within the interval μ ± k (σ) and the maximum proportion of a distribution outside the interval μ ± k (σ). Once again, Figure 3.A4.1 shows that for many distributions, the proportion outside the interval μ ± k (σ) will be much less than 1/ k^2. The value of this theorem is that it gives us some guidance about what constitutes extreme or unusual scores when we know only the mean and standard deviation of a distribution. However, if we do know something about the shape of the distribution, then our judgments about extreme or unusual scores can be much more precise. In Chapter 4, we will see exactly this. If we happen to know that scores were drawn from a normal distribution, we can make far more accurate statements about scores falling k standard deviations from the mean. This will have huge practical consequences that will carry through the rest of this book.

LEARNING CHECK 1

Let’s say we know the mean and standard deviation of a distribution but not its shape. Answer the following questions and round your answers to two decimal places.

  1. What is the minimum proportion of the distribution falling in the intervals (a) μ ± 2(σ), (b) μ ± 3(σ), (c) μ ± 1.25(σ), and (d) μ ± 1.4142(σ)? 2. What is the maximum proportion of the distribution falling (a) above μ + 1.25(σ) and (b) below μ – 1.25(σ)?
  2. What is the maximum proportion of the distribution falling (a) above μ + 1.4142(σ) and (b) below μ
  • 1.4142(σ)?

Answers

  1. (a) 1 - 1 / 2 2 = .75. (b) 1 - 1 / 32 = .89. (c) 1 - 1 /1.25^2 = .36. (d) 1 - 1 /1.4142 2 = .50. 2. (a) 0.5/1.25^2 = .32. (b) 0.5/1.25^2 = .32. 3. (a) 0.5/1.4142 2 = .25. (b) 0.5/1.4142 2 = .25.