Introduction to statistics, Lecture notes of Statistics

Author -DR ARIMEYAW This document gives you a deep understanding of statistics it includes notes and examples that will give you a better understanding

Typology: Lecture notes

2023/2024

Uploaded on 07/02/2025

jedidiah-arhin
jedidiah-arhin 🇬🇭

1 document

1 / 26

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
MEASURES OF RELATIVE POSITION (QUANTILES)
These measures describe the position of an observation within a dataset. Quantiles are
values below which or above which a percentage of the observations in a dataset must fall.
Examples of quantiles are the quartiles, the deciles, the percentiles, and other values obtained
by equal subdivisions of the data. Just as the median, these measures all partition the data
according to rankings and fixed percentages.
The Quartiles
Quartiles are measures that divide an ordered data into four equal parts, such that each
part contains 25% of the data. Practically, there are three quartiles the first (lower) quartile
(
1
Q
), the second (middle) quartile (
2
Q
), and the third (upper) quartile (
3
Q
). The diagram shown
below illustrates the nature of quartiles.
1
Q
is the value such that 25% of the observations are smaller than
1
Q
and 75% of the
observations are larger than
1
Q
.
2
Q
is the median that is 50% of the observations are smaller
and 50% are larger than
2
Q
.
3
Q
is the value such that 75% of the observations are smaller and
25% of the observations are larger.
Quartiles for ungrouped data
Firstly, arrange the values in the data in ascending order. Then, determine
( )
111 th
4
Qn=+
observation;
( )
221 th
4
Qn=+
observation; and
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a

Partial preview of the text

Download Introduction to statistics and more Lecture notes Statistics in PDF only on Docsity!

MEASURES OF RELATIVE POSITION (QUANTILES)

These measures describe the position of an observation within a dataset. Quantiles are

values below which or above which a percentage of the observations in a dataset must fall.

Examples of quantiles are the quartiles, the deciles, the percentiles, and other values obtained

by equal subdivisions of the data. Just as the median, these measures all partition the data

according to rankings and fixed percentages.

The Quartiles

Quartiles are measures that divide an ordered data into four equal parts, such that each

part contains 25% of the data. Practically, there are three quartiles – the first (lower) quartile

( Q 1 ), the second (middle) quartile ( Q 2 ), and the third (upper) quartile ( Q 3 ). The diagram shown

below illustrates the nature of quartiles.

Q 1 is the value such that 25% of the observations are smaller than Q 1 and 75% of the

observations are larger than Q 1. Q 2 is the median – that is 50% of the observations are smaller

and 50% are larger than Q 2. Q 3 is the value such that 75% of the observations are smaller and

25% of the observations are larger.

Quartiles for ungrouped data

Firstly, arrange the values in the data in ascending order. Then, determine

1 (^ )

1 th 4

Q = n + observation;

2 (^ )

1 th 4

Q = n + observation; and

3 (^ )

1 th 4

Q = n + observation

Example

For the data: 4, 8, 4, 6, 5, 8, 8, 15, 6, 15. Find the three quartiles.

Solution

Arranging the scores in ascending order:

n = 10

1 (^ )

Q = +

=2.7th observation

In actual fact, this is not an observation but a space. It indicates that the value for Q 1 is that of

the 2nd observation plus 75% of the difference between the values of the 2nd and 3rd

observations. That is,

Q 1 = 4 + (5 − 4)(0.75) =4.

Similarly,

2 (^ )

Q = +

=5.5th observation

= 6 + (^) ( 8 − (^6) )( 0.5 (^) )= 7

And,

3 (^ )

Q = +

=8.25th observation

The Deciles

Deciles are measures that split the ranked data into ten equal parts. The d thdeciles, denoted

by Dd , is the value X such that d % of the observations are less than Dd and (^) ( 100 − d )% are

greater than Dd. There are nine deciles, denoted D D 1 , 2 ,..., Dq , that partitioned the ordered data

into ten groups with 10 % of the data in each group as shown below.

Computing deciles

For ungrouped data,

( 1 th)

d

d D = n + observation

For instance,

1 (^ )

1 th 10

D = n + observation;

2 (^ )

1 th 10

D = n + observation;

3 (^ )

1 th 10

D = n + observation;

9 (^ )

1 th 10

D = n + observation

Example

Find D 1 (^) , D 2 and D 7 for the data 4,8, 4, 6,5,8,8,15, 6,15.

Solution

1 (^ )

D = +

( )( )

1.1th observation

D 2 =? D 9 =?

Deciles for grouped data

The d thdecile for grouped data is given by

1

CF

n

b i d d d

d f

D L w f

=

Example

Calculate the 1st,5thand 9thdeciles of the annual income data.

Solution

The first decile:

( ) 1

n

i

f

^ =^ =

Class containing D 1 : 140 − 159

Therefore, 1

D

1

D

Similarly, D 5 (^) =193.8and D 9 (^) =232.8.

Compute the 60th^ percentile.

Solution

Arrange the data in numerical order.

For the 60

th percentile,

( )

( )( )

60

15.6 observation

P

th

This means that sixty percent of the students received exam scores less than 105. 6 and the

remaining forty percent received scores above it. (60% of the students obtained 105.6 marks

or less)

Exercise

Refer to the preceding Example, calculate the 70th^ percentile for the data.

Percentiles for grouped data

The r thpercentile is found by using

1

CF

n

b i r r r

r f

P L w f

=

Example

For the annual income data, the 10thpercentile is determined as follows.

( ) 1

n

i

f

^ =^ =

Class containing P 10 (^) :140 − 159.

Hence,

10

P

This means that 10% of employees earn GH₵159,500 or less.

Similarly,

P 90 =232.

Note: Like the median, quantiles can also be read off from the ogive and the histogram.

Computing the percentile rank

Sometimes we are given a value from a dataset, and wish to compute the percentile rank

corresponding to that value. The following is a simple procedure for doing this.

Step 1: Arrange the data in increasing order.

Step 2: Let x be the value whose percentile rank is to be computed. Use the following formula

to compute the percentile rank:

( 0.5^ ) 100

B f r n

where,

B = the number of values less than x ;

f = the number of times the value x has occurred;

n = the number of values in the distribution.

MEASURES OF DISPERSION

  • Synonyms: measures of spread, measures of variation, measures of variability, measures

of scatter.

  • A measure of dispersion is either absolute or relative. Absolute measures of dispersion

are expressed in the same unit as the original data. A relative measure of dispersion is the

ratio or % of a measure of absolute deviation to an appropriate average.

The Range

The range of any given data is the difference between the largest and the smallest values in

that data. The range is often referred to as a crude measure of dispersion since it uses only two

values from the data in its computation. The range is interpreted as the number of points the

data is spread over.

  • Ungrouped data:

Range = Max. value – Min. value

Example

Refer to the example on examination scores. What is the range for the data?

  • Grouped data

Range = LkL 1 Lower class boundaries

Range = U (^) kU 1 Upper class boundaries

Range = CkC 1 Class limits

Range = X^ kX 1 Class midpoints

Example

With reference to the annual income data, what is the range?

The Interquartile Range

The interquartile range is the difference between the third quartile and the first quartile. That

is,

IQR = Q 3 − Q 1

Example

Refer to the examination scores data, find the IQR?

The Quartile Deviation

  • Also known as semi-interquartile range, and it’s given by

QD 3 1

Q − Q

Example

Refer to the examination scores data, find the QD?

Note: The measures considered so far use ‘Distance Approach’ for measuring dispersion. The

measures that follow use ‘Deviation Approach’.

The Mean Absolute Deviation

The mean deviation of a given data is calculated by first finding the absolute difference

between each observation and the mean of the data, and then by finding the mean of the

deviations by adding all of them and dividing by the number of deviations. The mean absolute

deviation (MAD) is given by

MAD^1

n

i i

X X

n

=

Or

Population variance (

2  )

  • Ungrouped Data:

( )

2

2 1

N

i i

X

N

=

  • Computationally convenient formula:

2 2

(^2 1 )

N N

i i i i

X X

N N

= =

= − ^ 

 

Or

2 2 2

1

1 N

i i

X N

N

=

Example: Calculate

2  for 25,18,15, 27and 30.

Solution

2 2 2 2 1 1 2803 115

5 5

N N

i i

X X

N N

= =

= − ^  = −   =

 

  • Grouped Data:

2 2

(^2 1 )

1 1

N N

i i i i i i N N

i i i i

f X f X

f f

 =^ =

= =

= − ^ 

 

 

  • Coding method:

2 2

2 1 1 2

N N

i i i i i i

f d f d

w N N

= =

= ^ −^ 

 ^ 

Sample variance

2 ( s )

  • Ungrouped data:

2

(^2 )

1

n

i i

X X

s n

=

For ease of computation,

2 2 2

1 1

n n

i i i i

s X X n (^) = n =

Example: Calculate the variance of the sample 3,1, 7, 4, 2,5and 6.

  • Grouped data:

2

(^2 )

2 2

1 1

n

i i i

n n

i i i i i i

f X X

S n

f X f X n n

=

= =

Solution

We summarise the calculations of the mean as below.

i xi fi fi xi

The mean is

i

i i

f

f x x.

Now, we can summarise the information needed to calculate the variance as shown.

i xi fi xix ( x x )^2 i − (^ )

2 f (^) i xix

1 10 2 − 20. 5 420.25 840.

Therefore, the variance is given by

( )

  1. 37 20 1

1

2

2

n

f x x

s

n

i

i i

,

and the standard deviation is given by

s = 152. 37 = 12. 34.

The Coefficient of Variation

Sometimes, a direct comparison of variation within two or more datasets using measures of

dispersion may lead to incorrect conclusions. For example, if a standard deviation of 5 cm is

associated with the lengths of playing fields in some selected schools; then it indicates more

precise measurements than if the same value were associated with lengths of pencils. Also, if

one data, for example, has to do with amounts in Ghana Cedis and the other weights of persons

in kilogrammes; then it would not be possible to compare them. Therefore, it is better to use a

relative measure of dispersion when either (1) the means of the distributions being compared

are far apart, or (2) the data are in different units.

The coefficient of variation (CV), which is a relative measure of dispersion, is the ratio of the

standard deviation, s, of a data to the mean, x , of the same data. That is, coefficient of variation

is given by

CV 100

Or

S

CV

X

Example

The mean and standard deviation of a certain data are respectively, 5 kg and 2 kg. Those for a

second data are respectively, 30 C

 and 3 C

. Which data has more variation? Justify for

answer.

Solution

Since the two data are in different units of measurement, we can compare their dispersions

better by calculating their CVs. For data in kg, the coefficient of variation is given by

  1. Standard Deviation and Mean Absolute Deviation

In a fairly normal distribution,

MAD

MEASURES OF SHAPE

When one is describing data, it is important to be able to recognize the shapes of the

distribution values. The shape of a distribution determines the appropriate statistical methods

used to analyze the data. A distribution can have many shapes, and one method of analyzing a

distribution is to draw a histogram or frequency polygon for the distribution. Distributions are

most often not perfectly shaped, so it is not necessary to have an exact shape but rather to

identify an overall pattern.

Measures of shape describe the shape of the distribution of a given dataset. There exist

several methods for measuring the shape of the distribution of a dataset. Two common methods

are measures of skewness and measures of kurtosis.

Measures of Skewness

Dispersion indicates the absolute and relative differences between the individual values

in the dataset. It is concerned with the amount of variation, rather than with its direction. Also,

it does not show the extent to which deviations cluster above or below the average. In such

situations, we use measures of skewness. Measures of skewness show the extent to which

distributions are pulled away from the symmetrical shape. If a distribution is not symmetrical,

it is asymmetrical or skewed. Skewness is the lack of symmetry in a distribution.

Types of skewness

Frequency distributions can assume many shapes. The three most important shapes are

symmetric, positively skewed, and negatively skewed.

  1. Symmetric distribution (No skewness)