Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Introduction to statistics, Lecture notes of Statistics

University of Cape Coast Statistics

Author -DR ARIMEYAW This document gives you a deep understanding of statistics it includes notes and examples that will give you a better understanding

Typology: Lecture notes

2023/2024

Uploaded on 07/02/2025

jedidiah-arhin 🇬🇭

1 document

1 / 26

This page cannot be seen from the preview

Don't miss anything!

MEASURES OF RELATIVE POSITION (QUANTILES)

These measures describe the position of an observation within a dataset. Quantiles are

values below which or above which a percentage of the observations in a dataset must fall.

Examples of quantiles are the quartiles, the deciles, the percentiles, and other values obtained

by equal subdivisions of the data. Just as the median, these measures all partition the data

according to rankings and fixed percentages.

The Quartiles

Quartiles are measures that divide an ordered data into four equal parts, such that each

part contains 25% of the data. Practically, there are three quartiles – the first (lower) quartile

(

1

Q

), the second (middle) quartile (

2

Q

), and the third (upper) quartile (

3

Q

). The diagram shown

below illustrates the nature of quartiles.

1

Q

is the value such that 25% of the observations are smaller than

1

Q

and 75% of the

observations are larger than

1

Q

.

2

Q

is the median – that is 50% of the observations are smaller

and 50% are larger than

2

Q

.

3

Q

is the value such that 75% of the observations are smaller and

25% of the observations are larger.

Quartiles for ungrouped data

Firstly, arrange the values in the data in ascending order. Then, determine

( )

111 th

4

Qn=+

observation;

( )

221 th

4

Qn=+

observation; and

Discover Lecture notes of Statistics University of Cape Coast

Partial preview of the text

Download Introduction to statistics and more Lecture notes Statistics in PDF only on Docsity!

MEASURES OF RELATIVE POSITION (QUANTILES)

These measures describe the position of an observation within a dataset. Quantiles are

values below which or above which a percentage of the observations in a dataset must fall.

Examples of quantiles are the quartiles, the deciles, the percentiles, and other values obtained

by equal subdivisions of the data. Just as the median, these measures all partition the data

according to rankings and fixed percentages.

The Quartiles

Quartiles are measures that divide an ordered data into four equal parts, such that each

part contains 25% of the data. Practically, there are three quartiles – the first (lower) quartile

( Q 1 ), the second (middle) quartile ( Q 2 ), and the third (upper) quartile ( Q 3 ). The diagram shown

below illustrates the nature of quartiles.

Q 1 is the value such that 25% of the observations are smaller than Q 1 and 75% of the

observations are larger than Q 1. Q 2 is the median – that is 50% of the observations are smaller

and 50% are larger than Q 2. Q 3 is the value such that 75% of the observations are smaller and

25% of the observations are larger.

Quartiles for ungrouped data

Firstly, arrange the values in the data in ascending order. Then, determine

1 (^ )

1 th 4

Q = n + observation;

2 (^ )

1 th 4

Q = n + observation; and

3 (^ )

1 th 4

Q = n + observation

Example

For the data: 4, 8, 4, 6, 5, 8, 8, 15, 6, 15. Find the three quartiles.

Solution

Arranging the scores in ascending order:

n = 10

1 (^ )

Q = +

=2.7th observation

In actual fact, this is not an observation but a space. It indicates that the value for Q 1 is that of

the 2nd observation plus 75% of the difference between the values of the 2nd and 3rd

observations. That is,

Q 1 = 4 + (5 − 4)(0.75) =4.

Similarly,

2 (^ )

Q = +

=5.5th observation

= 6 + (^) ( 8 − (^6) )( 0.5 (^) )= 7

And,

3 (^ )

Q = +

=8.25th observation

The Deciles

Deciles are measures that split the ranked data into ten equal parts. The d thdeciles, denoted

by Dd , is the value X such that d % of the observations are less than Dd and (^) ( 100 − d )% are

greater than Dd. There are nine deciles, denoted D D 1 , 2 ,..., Dq , that partitioned the ordered data

into ten groups with 10 % of the data in each group as shown below.

Computing deciles

For ungrouped data,

( 1 th)

d

d D = n + observation

For instance,

1 (^ )

1 th 10

D = n + observation;

2 (^ )

1 th 10

D = n + observation;

3 (^ )

1 th 10

D = n + observation;

9 (^ )

1 th 10

D = n + observation

Example

Find D 1 (^) , D 2 and D 7 for the data 4,8, 4, 6,5,8,8,15, 6,15.

Solution

1 (^ )

D = +

( )( )

1.1th observation

D 2 =? D 9 =?

Deciles for grouped data

The d thdecile for grouped data is given by

1

CF

n

b i d d d

d f

D L w f

=



Example

Calculate the 1st,5thand 9thdeciles of the annual income data.

Solution

The first decile:

( ) 1

n

i

f

^ =^ =

Class containing D 1 : 140 − 159

Therefore, 1

D

1

D

Similarly, D 5 (^) =193.8and D 9 (^) =232.8.

Compute the 60th^ percentile.

Solution

Arrange the data in numerical order.

For the 60

th percentile,

( )

( )( )

60

15.6 observation

P

th

This means that sixty percent of the students received exam scores less than 105. 6 and the

remaining forty percent received scores above it. (60% of the students obtained 105.6 marks

or less)

Exercise

Refer to the preceding Example, calculate the 70th^ percentile for the data.

Percentiles for grouped data

The r thpercentile is found by using

1

CF

n

b i r r r

r f

P L w f

=



Example

For the annual income data, the 10thpercentile is determined as follows.

( ) 1

n

i

f

^ =^ =

Class containing P 10 (^) :140 − 159.

Hence,

10

P

This means that 10% of employees earn GH₵159,500 or less.

Similarly,

P 90 =232.

Note: Like the median, quantiles can also be read off from the ogive and the histogram.

Computing the percentile rank

Sometimes we are given a value from a dataset, and wish to compute the percentile rank

corresponding to that value. The following is a simple procedure for doing this.

Step 1: Arrange the data in increasing order.

Step 2: Let x be the value whose percentile rank is to be computed. Use the following formula

to compute the percentile rank:

( 0.5^ ) 100

B f r n

where,

B = the number of values less than x ;

f = the number of times the value x has occurred;

n = the number of values in the distribution.

MEASURES OF DISPERSION

Synonyms: measures of spread, measures of variation, measures of variability, measures

of scatter.

A measure of dispersion is either absolute or relative. Absolute measures of dispersion

are expressed in the same unit as the original data. A relative measure of dispersion is the

ratio or % of a measure of absolute deviation to an appropriate average.

The Range

The range of any given data is the difference between the largest and the smallest values in

that data. The range is often referred to as a crude measure of dispersion since it uses only two

values from the data in its computation. The range is interpreted as the number of points the

data is spread over.

Ungrouped data:

Range = Max. value – Min. value

Example

Refer to the example on examination scores. What is the range for the data?

Grouped data

Range = Lk − L 1 Lower class boundaries

Range = U (^) k − U 1 Upper class boundaries

Range = Ck − C 1 Class limits

Range = X^ k − X 1 Class midpoints

Example

With reference to the annual income data, what is the range?

The Interquartile Range

The interquartile range is the difference between the third quartile and the first quartile. That

is,

IQR = Q 3 − Q 1

Example

Refer to the examination scores data, find the IQR?

The Quartile Deviation

Also known as semi-interquartile range, and it’s given by

QD 3 1

Q − Q

Example

Refer to the examination scores data, find the QD?

Note: The measures considered so far use ‘Distance Approach’ for measuring dispersion. The

measures that follow use ‘Deviation Approach’.

The Mean Absolute Deviation

The mean deviation of a given data is calculated by first finding the absolute difference

between each observation and the mean of the data, and then by finding the mean of the

deviations by adding all of them and dividing by the number of deviations. The mean absolute

deviation (MAD) is given by

MAD^1

n

i i

X X

n

=



Or

Population variance (

2  )

Ungrouped Data:

( )

2

2 1

N

i i

X

N

=



Computationally convenient formula:

2 2

(^2 1 )

N N

i i i i

X X

N N



= =

= − ^ 

 

Or

2 2 2

1

1 N

i i

X N

N

=



Example: Calculate

2  for 25,18,15, 27and 30.

Solution

2 2 2 2 1 1 2803 115

5 5

N N

i i

X X

N N



= =

= − ^  = −   =

 

Grouped Data:

2 2

(^2 1 )

1 1

N N

i i i i i i N N

i i i i

f X f X

f f

 =^ =

= =

= − ^ 

 

Coding method:

2 2

2 1 1 2

N N

i i i i i i

f d f d

w N N

= =

= ^ −^ 

 ^ 

Sample variance

2 ( s )

Ungrouped data:

2

(^2 )

1

n

i i

X X

s n

=

For ease of computation,

2 2 2

1 1

n n

i i i i

s X X n (^) = n =

Example: Calculate the variance of the sample 3,1, 7, 4, 2,5and 6.

Grouped data:

2

(^2 )

2 2

1 1

n

i i i

n n

i i i i i i

f X X

S n

f X f X n n

=

= =

Solution

We summarise the calculations of the mean as below.

i xi fi fi xi

The mean is



i

i i

f

f x x.

Now, we can summarise the information needed to calculate the variance as shown.

i xi fi xi − x ( x x )^2 i − (^ )

2 f (^) i xi − x

1 10 2 − 20. 5 420.25 840.

Therefore, the variance is given by

( )

37 20 1

1

2

−



n

f x x

s

n

i

i i

,

and the standard deviation is given by

s = 152. 37 = 12. 34.

The Coefficient of Variation

Sometimes, a direct comparison of variation within two or more datasets using measures of

dispersion may lead to incorrect conclusions. For example, if a standard deviation of 5 cm is

associated with the lengths of playing fields in some selected schools; then it indicates more

precise measurements than if the same value were associated with lengths of pencils. Also, if

one data, for example, has to do with amounts in Ghana Cedis and the other weights of persons

in kilogrammes; then it would not be possible to compare them. Therefore, it is better to use a

relative measure of dispersion when either (1) the means of the distributions being compared

are far apart, or (2) the data are in different units.

The coefficient of variation (CV), which is a relative measure of dispersion, is the ratio of the

standard deviation, s, of a data to the mean, x , of the same data. That is, coefficient of variation

is given by

CV 100

Or

S

CV

X

Example

The mean and standard deviation of a certain data are respectively, 5 kg and 2 kg. Those for a

second data are respectively, 30 C

 and 3 C



. Which data has more variation? Justify for

answer.

Solution

Since the two data are in different units of measurement, we can compare their dispersions

better by calculating their CVs. For data in kg, the coefficient of variation is given by

Standard Deviation and Mean Absolute Deviation

In a fairly normal distribution,

MAD

MEASURES OF SHAPE

When one is describing data, it is important to be able to recognize the shapes of the

distribution values. The shape of a distribution determines the appropriate statistical methods

used to analyze the data. A distribution can have many shapes, and one method of analyzing a

distribution is to draw a histogram or frequency polygon for the distribution. Distributions are

most often not perfectly shaped, so it is not necessary to have an exact shape but rather to

identify an overall pattern.

Measures of shape describe the shape of the distribution of a given dataset. There exist

several methods for measuring the shape of the distribution of a dataset. Two common methods

are measures of skewness and measures of kurtosis.

Measures of Skewness

Dispersion indicates the absolute and relative differences between the individual values

in the dataset. It is concerned with the amount of variation, rather than with its direction. Also,

it does not show the extent to which deviations cluster above or below the average. In such

situations, we use measures of skewness. Measures of skewness show the extent to which

distributions are pulled away from the symmetrical shape. If a distribution is not symmetrical,

it is asymmetrical or skewed. Skewness is the lack of symmetry in a distribution.

Types of skewness

Frequency distributions can assume many shapes. The three most important shapes are

symmetric, positively skewed, and negatively skewed.

Symmetric distribution (No skewness)

Introduction to statistics, Lecture notes of Statistics

Related documents

Partial preview of the text

Download Introduction to statistics and more Lecture notes Statistics in PDF only on Docsity!

MEASURES OF RELATIVE POSITION (QUANTILES)

1 (^ )

2 (^ )

3 (^ )

n = 10

1 (^ )

Q = +

=2.7th observation

Q 1 = 4 + (5 − 4)(0.75) =4.

2 (^ )

Q = +

=5.5th observation

3 (^ )

Q = +

=8.25th observation

Deciles are measures that split the ranked data into ten equal parts. The d thdeciles, denoted

into ten groups with 10 % of the data in each group as shown below.

( 1 th)

1 (^ )

2 (^ )

3 (^ )

9 (^ )

1 (^ )

D = +

D 2 =? D 9 =?

CF

Calculate the 1st,5thand 9thdeciles of the annual income data.

f

D

D

P

The r thpercentile is found by using

CF

For the annual income data, the 10thpercentile is determined as follows.

f

P

P 90 =232.

MEASURES OF DISPERSION

IQR = Q 3 − Q 1

QD 3 1

Q − Q

MAD^1

X X

X

N

X X

N N

= − ^ 

1 N

X N

N

X X

N N

= − ^  = −   =

 =^ =

= − ^ 

= ^ −^ 

 ^ 

X X

2



CV 100

S

CV

X

MAD

MEASURES OF SHAPE