Summarizing Data: Averages and Measures of Dispersion, Exams of Mathematics

A concise overview of summarizing data, focusing on measures of central tendency and dispersion. It covers averages such as mode, median, and mean, detailing how to calculate them for both discrete and grouped data. Additionally, it explains measures of dispersion like range, interquartile range (iqr), interpercentile range (ipr), interdecile range, and standard deviation. The document also discusses transforming data and how changes to data affect averages, offering advantages and disadvantages of each measure. It is designed to help students understand and apply these statistical concepts effectively, providing clear formulas and step-by-step instructions for calculations. This guide is useful for high school and early university students studying introductory statistics.

Typology: Exams

2024/2025

Uploaded on 06/10/2025

fergfre-tgrhj
fergfre-tgrhj 🇬🇧

5 documents

1 / 10

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Chapter 3 Summarising Data
Averages
A measure of central tendency (represents the ‘centre’ of a set of data). Includes mode, median and mean.
Mode
The one that appears the most (remember the Mo in mode and Mo in most) the most common value.
Modal Class the class with the highest frequency (the frequency value is not the mode but the
column/row next to it).
Median
The middle value.
Discrete Data:
1. Put the numbers in order from smallest to largest.
2. The median is the 𝟏
𝟐(𝒏+𝟏)𝒕𝒉 value this means find your total frequency, add 1 to it and then
divide by 2. The answer is not your median but the position your median will be in the list of data.
Example: If total frequency is 23, the median position will be ½ (23+1) = 12th number in the list after
being put in order.
3. Find the median position from the list of values. This is your median value.
If the median position is a decimal value such as 7.5 then you would find the 7th and 8th values in
the list and then divide by 2.
If data is in a frequency table, add the frequency values (like you would do with cumulative frequency)
until you reach a row that includes the median position in between it. The median is the category/class
that contains the 1
2(𝑛+1)𝑡ℎ value.
Grouped Data:
For grouped continuous data (has classes with the inequality symbols), the median is the ½nth value.
The median class is the class interval which contains the median position.
Sometimes you may be asked to work out an estimate for the median value rather than the median class.
For grouped data your median will always be an estimate as you do not know exact values.
Estimate Median using Linear Interpolation:
1. Use ½ n to find the median position.
2. Find Cumulative Frequency (CF) of the frequency column until you reach the class interval that
contains the ½ nth value. This is the group that contains the median.
3. Find the median’s position in the group and see how many more values you need in that class to
get to the median.
Do this by subtracting the CF of the group above from your ½ nth value.
4. Divide this number by the frequency for the median class.
5. Multiply your answer by the class width.
6. Add your answer to the lower bound for the class interval. This is your estimate for the median
value.
pf3
pf4
pf5
pf8
pf9
pfa

Partial preview of the text

Download Summarizing Data: Averages and Measures of Dispersion and more Exams Mathematics in PDF only on Docsity!

Chapter 3 – Summarising Data

Averages

A measure of central tendency (represents the ‘centre’ of a set of data). Includes mode, median and mean.

Mode

The one that appears the most (remember the Mo in mode and Mo in most) – the most common value.

Modal Class – the class with the highest frequency (the frequency value is not the mode but the

column/row next to it).

Median

The middle value.

Discrete Data:

  1. Put the numbers in order from smallest to largest.
  2. The median is the

𝟏

𝟐

𝒕𝒉 value – this means find your total frequency, add 1 to it and then

divide by 2. The answer is not your median but the position your median will be in the list of data.

Example: If total frequency is 23, the median position will be ½ (23+1) = 12

th

number in the list after

being put in order.

  1. Find the median position from the list of values. This is your median value.

If the median position is a decimal value such as 7.5 then you would find the 7

th

and 8

th

values in

the list and then divide by 2.

If data is in a frequency table, add the frequency values (like you would do with cumulative frequency)

until you reach a row that includes the median position in between it. The median is the category/class

that contains the

1

2

(𝑛 + 1 )𝑡ℎ value.

Grouped Data:

For grouped continuous data (has classes with the inequality symbols), the median is the ½nth value.

The median class is the class interval which contains the median position.

Sometimes you may be asked to work out an estimate for the median value rather than the median class.

For grouped data your median will always be an estimate as you do not know exact values.

Estimate Median using Linear Interpolation:

  1. Use ½ n to find the median position.
  2. Find Cumulative Frequency (CF) of the frequency column until you reach the class interval that

contains the ½ nth value. This is the group that contains the median.

  1. Find the median’s position in the group and see how many more values you need in that class to

get to the median.

Do this by subtracting the CF of the group above from your ½ nth value.

  1. Divide this number by the frequency for the median class.
  2. Multiply your answer by the class width.
  3. Add your answer to the lower bound for the class interval. This is your estimate for the median

value.

Mean (also called Arithmetic Mean)

The sum of all the values divided by the number of values.

Discrete Data:

  1. Add all the values.
  2. Divide by the number of values.

Formula for Mean: 𝒙̅ =

Where 𝑥̅ = mean

∑ = Greek letter sigma which is the symbol for ‘sum of’

𝑥 = data values

𝑛 = number of data values

So

𝑥 means sum of all values

Frequency Table (not grouped):

  1. Add an extra column to the table and label as𝑓 × 𝑥.
  2. Multiply the values in the first two columns for each row and write the answer in the new third

column.

  1. Add up the third column – this gives you the total (
  1. Add up the frequency column (∑ 𝑓)
  2. Divide answer to 3 by answer to 4.

Formula:

, f stands for frequency

𝑓𝑥 = total of 3

rd

column

𝑓 = Total Frequency

Frequency Table (grouped):

  1. Add 2 extra columns to the table and label as midpoint and𝑓 × 𝑚𝑖𝑑𝑝𝑜𝑖𝑛𝑡.
  2. Calculate the midpoint of the class intervals
  3. Multiply the midpoint and frequency values for each row and write answers in last 𝑓 × 𝑚𝑖𝑑𝑝𝑜𝑖𝑛𝑡

column.

  1. Add up the 𝑓 × 𝑚𝑖𝑑𝑝𝑜𝑖𝑛𝑡 column – this gives you the total (
  1. Add up the frequency column (∑ 𝑓)
  2. Divide answer to 4 by answer to 5

Formula:

∑(𝒇 × 𝒎𝒊𝒅𝒑𝒐𝒊𝒏𝒕)

How Changes to your Data affect the Averages

Mode – could change only if the new value changes which value appears the most. Could also make the

data bimodal if there are now two values that appear the same amount.

Median – If you add a value that is greater than the median, the median might increase.

If you add a value that is smaller than the median, the median might decrease.

If you remove a value that is greater than the median, the median might decrease.

If you remove a value that is smaller than the median, the median might increase.

If you add/remove one value that is greater and one that is smaller than the median, the

median stays the same.

Mean – If you add a value that is greater than the mean, the mean increases.

If you take away a value that is less than the mean, the mean increases.

If you add a value that is less than the mean, the mean decreases.

If you take away a value that is greater than the mean, the mean decreases.

If you replace a value in your data with another number that is greater/smaller than the

original, the mean will also change.

Deciding which Average to Use

Advantages Disadvantages

Mode

 Easy to use

 Always a value in the data

 Unaffected by extreme values

 Can be used with quantitative and

qualitative data

 There may not be a mode or may be

more than one mode.

 Cannot be used to calculate measures

of spread.

 Not always representative of the data –

can include extreme values and can be

a misleading value far from the mean.

Median

 Easy to find when data is in order

 Unaffected by outliers/extreme values

 Best to use with skewed data

 Can be used to calculate quartiles, IQR

and skew.

 May not be data value

 Not always representative of the data.

Mean

 Uses all the data

 Can be used to calculate standard

deviation and skew.

 May not be a data value

 Always affected by extreme values or

outlier.

 Can be distorted by open-ended

classes.

Measures of Dispersion

Range

How spread out the data is.

The difference between the biggest and smallest values.

For data from tables the largest value is the biggest number from the first column and the smallest value is

the first number from the first column.

Interquartile Range (IQR)

“Between Quartiles”

The middle 50% of the data when in order.

Lower Quartile (LQ) – The value ¼ of the way through the data. 25% of the data is less than the LQ.

Upper Quartile (UQ) – The value ¾ of the way through the data. 25% of the data is above than the UQ.

Discrete Data

LQ = ¼ (n+1)th value

UQ = ¾ (n+1)th value

  1. Put the data in order, smallest to largest.
  2. Work out the lower and upper quartiles using the above formulae.

If your LQ is 2.

th

value, divide the interval between the 2

nd

and 3

rd

values into quarters and use

this to work out what the LQ value will be – find the value ¾ of the way between the 2

nd

and 3

rd

values.

If 2

nd

value = 2 and 3

rd

value = 4, then 2.

th

value = ((4-2)/4)*3 + 2

nd

value = 0.5*3 + 2 = 1.5+2=3.

3. IQR = UQ – LQ

Grouped Data

LQ = ¼ nth value

UQ = ¾ nth value

  1. Draw your CF curve.
  2. Use the above formulae to find the positions for LQ (25%) and UQ (75%).
  3. Draw lines from the 25% and 75% marks on the y-axis. The corresponding x-axis values give you

your LQ and UQ values.

4. IQR = UQ – LQ

Frequency Table (not grouped)

Formulae: 𝝈 =

∑ 𝒇(𝒙−𝒙̅ )

𝟐

∑ 𝒇

OR 𝝈 =

∑ 𝒇𝒙

𝟐

∑ 𝒇

∑ 𝒇𝒙

∑ 𝒇

𝟐

𝑓 = 𝑛 (total frequency)

∑ 𝑓𝑥

∑ 𝑓

Using the first formula:

  1. Calculate the mean.
  2. Create a new column for 𝒙 − 𝒙̅. Subtract mean from each value in the first column.
  3. Square each answer to step 2 – create new column.
  4. Multiply each answer in step 3 by corresponding frequency – create new column.
  5. Add answers to step 4 – add the last column.
  6. Divide answer to step 5 by total of frequency column.
  7. Square root.

Using the second formula:

  1. Add three columns: fx, x

2

and fx

2

and calculate these values. Remember to add these columns.

  1. Calculate the mean.
  2. Substitute your values into the formula and work out the answer.

Grouped

For grouped frequency tables, follow the same step as for frequency table but use the midpoint for x.

You may need to create an extra column to your table for the midpoint before carrying out the above

steps.

Box Plots

Divide the data into sections that each contain approximately 25% of the data in that set.

Represents important features of the data and gives a summary of the spread/skew of the data.

Box Plots include 5 pieces of information about the data:

  1. Minimum Value – the lowest score, shown at the far left of the diagram
  2. Lower Quartile (LQ) – 25% of data is below this
  3. Median – Mark the middle of the data – 50% of the data is

above/below this value

  1. Upper Quartile (UQ) – 25% of data is above this value/75% of

data is below it.

  1. Maximum Value – The highest score, shown at the far right of

the diagram

The total length of the box plot represents the range.

The box represents the middle 50% and the IQR.

Drawing Box Plots:

  1. Calculate your LQ, UQ, median and identify your minimum and maximum value.
  2. Mark these 5 points on your diagram – the minimum and maximum values with small lines and the

other three with bigger lines.

  1. Draw a box around the big three lines.
  2. Connect the box to the min/max points using horizontal lines.

Outliers

Values that are far from the rest of your data and don’t fit the general pattern.

Can show errors in the data

Including outliers may misrepresent your data but not including them could falsify your data.

They distort the data so you need to identify them.

Outliers are more than 1.5 X IQR above UQ or below LQ.

𝑶𝒖𝒕𝒍𝒊𝒆𝒓𝒔 𝒂𝒓𝒆 𝒗𝒂𝒍𝒖𝒆𝒔 > 𝑼𝑸 + (𝟏. 𝟓 × 𝑰𝑸𝑹)
𝒐𝒓 < 𝑳𝑸 − (𝟏. 𝟓 × 𝑰𝑸𝑹)
  1. Work out IQR
  2. Find 1.5 x IQR
  3. Subtract this value from LQ and add to UQ.
  4. These values are now your new min/max points for your box plot. Any values in your data outside

of this range are outliers.

  1. Mark outliers with an X on your box plot.

Outliers can also be found using the mean and standard deviation – they are values more than 3 SD away

from the mean.

Interpreting box plots – Compare median for measure of average and range or IQR for measure of spread.

Remember to compare in context of the question for full marks.

Compare skewness of both box plots.

Comparing Data Sets

Compare using a measure of average (mean/median/mode) and spread (range/IQR/SD) or skewness.

Always make reference to individual values and mention which data set is larger/smaller than the other

clearly.

Always interpret in context – link back to the scenario in the question and labels on axes.

Example Comparisons and Interpretations of Data

Replace ‘data sets’ and ‘results’ with appropriate keyword from the question.

 Comparing Averages:

Mean/median/mode for data set A is larger than mean/median/mode data set B so on average data set A

is more … than data set B.

 Comparing spread:

Range/IQR/SD for data set A is larger than that of data set B so the ‘results’ of data set A are more spread

out/less consistent than those of data set B.

Data A has a smaller range/IQR/SD than data set B which means the ‘results’ for ‘data set A’ are more

consistent.

Remember lower SD means values are closer to the mean and therefore similar.

 Comparing Skew:

Box Plot for data set A is positively skewed so majority of ‘results’ were low with few higher ‘results’.

Box plot for data set A is negatively skewed so majority of ‘results’ were high with few lower ‘results’.

When comparing data make sure to pair the appropriate values of average and spread.

Average Measure of Spread

Mode Range

Median Range/IQR

Mean Range/SD