


















Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Author -DR ARIMEYAW This document gives you a deep understanding of statistics it includes notes and examples that will give you a better understanding
Typology: Lecture notes
1 / 26
This page cannot be seen from the preview
Don't miss anything!



















These measures describe the position of an observation within a dataset. Quantiles are
values below which or above which a percentage of the observations in a dataset must fall.
Examples of quantiles are the quartiles, the deciles, the percentiles, and other values obtained
by equal subdivisions of the data. Just as the median, these measures all partition the data
according to rankings and fixed percentages.
The Quartiles
Quartiles are measures that divide an ordered data into four equal parts, such that each
part contains 25% of the data. Practically, there are three quartiles – the first (lower) quartile
( Q 1 ), the second (middle) quartile ( Q 2 ), and the third (upper) quartile ( Q 3 ). The diagram shown
below illustrates the nature of quartiles.
Q 1 is the value such that 25% of the observations are smaller than Q 1 and 75% of the
observations are larger than Q 1. Q 2 is the median – that is 50% of the observations are smaller
and 50% are larger than Q 2. Q 3 is the value such that 75% of the observations are smaller and
25% of the observations are larger.
Quartiles for ungrouped data
Firstly, arrange the values in the data in ascending order. Then, determine
1 th 4
Q = n + observation;
1 th 4
Q = n + observation; and
1 th 4
Q = n + observation
Example
For the data: 4, 8, 4, 6, 5, 8, 8, 15, 6, 15. Find the three quartiles.
Solution
Arranging the scores in ascending order:
In actual fact, this is not an observation but a space. It indicates that the value for Q 1 is that of
the 2nd observation plus 75% of the difference between the values of the 2nd and 3rd
observations. That is,
Similarly,
= 6 + (^) ( 8 − (^6) )( 0.5 (^) )= 7
And,
The Deciles
by Dd , is the value X such that d % of the observations are less than Dd and (^) ( 100 − d )% are
greater than Dd. There are nine deciles, denoted D D 1 , 2 ,..., Dq , that partitioned the ordered data
Computing deciles
For ungrouped data,
d
d D = n + observation
For instance,
1 th 10
D = n + observation;
1 th 10
D = n + observation;
1 th 10
D = n + observation;
1 th 10
D = n + observation
Example
Find D 1 (^) , D 2 and D 7 for the data 4,8, 4, 6,5,8,8,15, 6,15.
Solution
( )( )
1.1th observation
Deciles for grouped data
The d thdecile for grouped data is given by
1
n
b i d d d
d f
D L w f
=
Example
Solution
The first decile:
( ) 1
n
i
^ =^ =
Class containing D 1 : 140 − 159
Therefore, 1
1
Similarly, D 5 (^) =193.8and D 9 (^) =232.8.
Compute the 60th^ percentile.
Solution
Arrange the data in numerical order.
For the 60
th percentile,
( )
( )( )
60
15.6 observation
th
This means that sixty percent of the students received exam scores less than 105. 6 and the
remaining forty percent received scores above it. (60% of the students obtained 105.6 marks
or less)
Exercise
Refer to the preceding Example, calculate the 70th^ percentile for the data.
Percentiles for grouped data
1
n
b i r r r
r f
P L w f
=
Example
( ) 1
n
i
^ =^ =
Class containing P 10 (^) :140 − 159.
Hence,
10
This means that 10% of employees earn GH₵159,500 or less.
Similarly,
Note: Like the median, quantiles can also be read off from the ogive and the histogram.
Computing the percentile rank
Sometimes we are given a value from a dataset, and wish to compute the percentile rank
corresponding to that value. The following is a simple procedure for doing this.
Step 1: Arrange the data in increasing order.
Step 2: Let x be the value whose percentile rank is to be computed. Use the following formula
to compute the percentile rank:
( 0.5^ ) 100
B f r n
where,
B = the number of values less than x ;
f = the number of times the value x has occurred;
n = the number of values in the distribution.
of scatter.
are expressed in the same unit as the original data. A relative measure of dispersion is the
ratio or % of a measure of absolute deviation to an appropriate average.
The Range
The range of any given data is the difference between the largest and the smallest values in
that data. The range is often referred to as a crude measure of dispersion since it uses only two
values from the data in its computation. The range is interpreted as the number of points the
data is spread over.
Range = Max. value – Min. value
Example
Refer to the example on examination scores. What is the range for the data?
Range = Lk − L 1 Lower class boundaries
Range = U (^) k − U 1 Upper class boundaries
Range = Ck − C 1 Class limits
Range = X^ k − X 1 Class midpoints
Example
With reference to the annual income data, what is the range?
The Interquartile Range
The interquartile range is the difference between the third quartile and the first quartile. That
is,
Example
Refer to the examination scores data, find the IQR?
The Quartile Deviation
Example
Refer to the examination scores data, find the QD?
Note: The measures considered so far use ‘Distance Approach’ for measuring dispersion. The
measures that follow use ‘Deviation Approach’.
The Mean Absolute Deviation
The mean deviation of a given data is calculated by first finding the absolute difference
between each observation and the mean of the data, and then by finding the mean of the
deviations by adding all of them and dividing by the number of deviations. The mean absolute
deviation (MAD) is given by
n
i i
n
=
Or
Population variance (
2 )
( )
2
2 1
N
i i
=
2 2
(^2 1 )
N N
i i i i
= =
Or
2 2 2
1
i i
=
Example: Calculate
2 for 25,18,15, 27and 30.
Solution
2 2 2 2 1 1 2803 115
5 5
N N
i i
= =
2 2
(^2 1 )
1 1
N N
i i i i i i N N
i i i i
f X f X
f f
= =
2 2
2 1 1 2
N N
i i i i i i
f d f d
w N N
= =
Sample variance
2 ( s )
2
(^2 )
1
n
i i
s n
=
For ease of computation,
2 2 2
1 1
n n
i i i i
s X X n (^) = n =
Example: Calculate the variance of the sample 3,1, 7, 4, 2,5and 6.
2
(^2 )
2 2
1 1
n
i i i
n n
i i i i i i
f X X
S n
f X f X n n
=
= =
Solution
We summarise the calculations of the mean as below.
i xi fi fi xi
The mean is
i
i i
f
f x x.
Now, we can summarise the information needed to calculate the variance as shown.
i xi fi xi − x ( x x )^2 i − (^ )
2 f (^) i xi − x
1 10 2 − 20. 5 420.25 840.
Therefore, the variance is given by
( )
1
2
−
n
f x x
s
n
i
i i
,
and the standard deviation is given by
s = 152. 37 = 12. 34.
The Coefficient of Variation
Sometimes, a direct comparison of variation within two or more datasets using measures of
dispersion may lead to incorrect conclusions. For example, if a standard deviation of 5 cm is
associated with the lengths of playing fields in some selected schools; then it indicates more
precise measurements than if the same value were associated with lengths of pencils. Also, if
one data, for example, has to do with amounts in Ghana Cedis and the other weights of persons
in kilogrammes; then it would not be possible to compare them. Therefore, it is better to use a
relative measure of dispersion when either (1) the means of the distributions being compared
are far apart, or (2) the data are in different units.
The coefficient of variation (CV), which is a relative measure of dispersion, is the ratio of the
standard deviation, s, of a data to the mean, x , of the same data. That is, coefficient of variation
is given by
Or
Example
The mean and standard deviation of a certain data are respectively, 5 kg and 2 kg. Those for a
second data are respectively, 30 C
and 3 C
. Which data has more variation? Justify for
answer.
Solution
Since the two data are in different units of measurement, we can compare their dispersions
better by calculating their CVs. For data in kg, the coefficient of variation is given by
In a fairly normal distribution,
When one is describing data, it is important to be able to recognize the shapes of the
distribution values. The shape of a distribution determines the appropriate statistical methods
used to analyze the data. A distribution can have many shapes, and one method of analyzing a
distribution is to draw a histogram or frequency polygon for the distribution. Distributions are
most often not perfectly shaped, so it is not necessary to have an exact shape but rather to
identify an overall pattern.
Measures of shape describe the shape of the distribution of a given dataset. There exist
several methods for measuring the shape of the distribution of a dataset. Two common methods
are measures of skewness and measures of kurtosis.
Measures of Skewness
Dispersion indicates the absolute and relative differences between the individual values
in the dataset. It is concerned with the amount of variation, rather than with its direction. Also,
it does not show the extent to which deviations cluster above or below the average. In such
situations, we use measures of skewness. Measures of skewness show the extent to which
distributions are pulled away from the symmetrical shape. If a distribution is not symmetrical,
it is asymmetrical or skewed. Skewness is the lack of symmetry in a distribution.
Types of skewness
Frequency distributions can assume many shapes. The three most important shapes are
symmetric, positively skewed, and negatively skewed.