






Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
A concise overview of summarizing data, focusing on measures of central tendency and dispersion. It covers averages such as mode, median, and mean, detailing how to calculate them for both discrete and grouped data. Additionally, it explains measures of dispersion like range, interquartile range (iqr), interpercentile range (ipr), interdecile range, and standard deviation. The document also discusses transforming data and how changes to data affect averages, offering advantages and disadvantages of each measure. It is designed to help students understand and apply these statistical concepts effectively, providing clear formulas and step-by-step instructions for calculations. This guide is useful for high school and early university students studying introductory statistics.
Typology: Exams
1 / 10
This page cannot be seen from the preview
Don't miss anything!







A measure of central tendency (represents the ‘centre’ of a set of data). Includes mode, median and mean.
The one that appears the most (remember the Mo in mode and Mo in most) – the most common value.
Modal Class – the class with the highest frequency (the frequency value is not the mode but the
column/row next to it).
The middle value.
Discrete Data:
𝟏
𝟐
𝒕𝒉 value – this means find your total frequency, add 1 to it and then
divide by 2. The answer is not your median but the position your median will be in the list of data.
Example: If total frequency is 23, the median position will be ½ (23+1) = 12
th
number in the list after
being put in order.
If the median position is a decimal value such as 7.5 then you would find the 7
th
and 8
th
values in
the list and then divide by 2.
If data is in a frequency table, add the frequency values (like you would do with cumulative frequency)
until you reach a row that includes the median position in between it. The median is the category/class
that contains the
1
2
(𝑛 + 1 )𝑡ℎ value.
Grouped Data:
For grouped continuous data (has classes with the inequality symbols), the median is the ½nth value.
The median class is the class interval which contains the median position.
Sometimes you may be asked to work out an estimate for the median value rather than the median class.
For grouped data your median will always be an estimate as you do not know exact values.
Estimate Median using Linear Interpolation:
contains the ½ nth value. This is the group that contains the median.
get to the median.
Do this by subtracting the CF of the group above from your ½ nth value.
value.
The sum of all the values divided by the number of values.
Discrete Data:
Where 𝑥̅ = mean
∑ = Greek letter sigma which is the symbol for ‘sum of’
𝑥 = data values
𝑛 = number of data values
So
𝑥 means sum of all values
Frequency Table (not grouped):
column.
Formula:
, f stands for frequency
𝑓𝑥 = total of 3
rd
column
𝑓 = Total Frequency
Frequency Table (grouped):
column.
Formula:
Mode – could change only if the new value changes which value appears the most. Could also make the
data bimodal if there are now two values that appear the same amount.
Median – If you add a value that is greater than the median, the median might increase.
If you add a value that is smaller than the median, the median might decrease.
If you remove a value that is greater than the median, the median might decrease.
If you remove a value that is smaller than the median, the median might increase.
If you add/remove one value that is greater and one that is smaller than the median, the
median stays the same.
Mean – If you add a value that is greater than the mean, the mean increases.
If you take away a value that is less than the mean, the mean increases.
If you add a value that is less than the mean, the mean decreases.
If you take away a value that is greater than the mean, the mean decreases.
If you replace a value in your data with another number that is greater/smaller than the
original, the mean will also change.
Advantages Disadvantages
Mode
Easy to use
Always a value in the data
Unaffected by extreme values
Can be used with quantitative and
qualitative data
There may not be a mode or may be
more than one mode.
Cannot be used to calculate measures
of spread.
Not always representative of the data –
can include extreme values and can be
a misleading value far from the mean.
Median
Easy to find when data is in order
Unaffected by outliers/extreme values
Best to use with skewed data
Can be used to calculate quartiles, IQR
and skew.
May not be data value
Not always representative of the data.
Mean
Uses all the data
Can be used to calculate standard
deviation and skew.
May not be a data value
Always affected by extreme values or
outlier.
Can be distorted by open-ended
classes.
How spread out the data is.
The difference between the biggest and smallest values.
For data from tables the largest value is the biggest number from the first column and the smallest value is
the first number from the first column.
“Between Quartiles”
The middle 50% of the data when in order.
Lower Quartile (LQ) – The value ¼ of the way through the data. 25% of the data is less than the LQ.
Upper Quartile (UQ) – The value ¾ of the way through the data. 25% of the data is above than the UQ.
Discrete Data
LQ = ¼ (n+1)th value
UQ = ¾ (n+1)th value
If your LQ is 2.
th
value, divide the interval between the 2
nd
and 3
rd
values into quarters and use
this to work out what the LQ value will be – find the value ¾ of the way between the 2
nd
and 3
rd
values.
If 2
nd
value = 2 and 3
rd
value = 4, then 2.
th
value = ((4-2)/4)*3 + 2
nd
value = 0.5*3 + 2 = 1.5+2=3.
Grouped Data
LQ = ¼ nth value
UQ = ¾ nth value
your LQ and UQ values.
Frequency Table (not grouped)
Formulae: 𝝈 =
∑ 𝒇(𝒙−𝒙̅ )
𝟐
∑ 𝒇
∑ 𝒇𝒙
𝟐
∑ 𝒇
∑ 𝒇𝒙
∑ 𝒇
𝟐
𝑓 = 𝑛 (total frequency)
∑ 𝑓𝑥
∑ 𝑓
Using the first formula:
Using the second formula:
2
and fx
2
and calculate these values. Remember to add these columns.
Grouped
For grouped frequency tables, follow the same step as for frequency table but use the midpoint for x.
You may need to create an extra column to your table for the midpoint before carrying out the above
steps.
Divide the data into sections that each contain approximately 25% of the data in that set.
Represents important features of the data and gives a summary of the spread/skew of the data.
Box Plots include 5 pieces of information about the data:
above/below this value
data is below it.
the diagram
The total length of the box plot represents the range.
The box represents the middle 50% and the IQR.
Drawing Box Plots:
other three with bigger lines.
Outliers
Values that are far from the rest of your data and don’t fit the general pattern.
Can show errors in the data
Including outliers may misrepresent your data but not including them could falsify your data.
They distort the data so you need to identify them.
Outliers are more than 1.5 X IQR above UQ or below LQ.
of this range are outliers.
Outliers can also be found using the mean and standard deviation – they are values more than 3 SD away
from the mean.
Interpreting box plots – Compare median for measure of average and range or IQR for measure of spread.
Remember to compare in context of the question for full marks.
Compare skewness of both box plots.
Compare using a measure of average (mean/median/mode) and spread (range/IQR/SD) or skewness.
Always make reference to individual values and mention which data set is larger/smaller than the other
clearly.
Always interpret in context – link back to the scenario in the question and labels on axes.
Example Comparisons and Interpretations of Data
Replace ‘data sets’ and ‘results’ with appropriate keyword from the question.
Comparing Averages:
Mean/median/mode for data set A is larger than mean/median/mode data set B so on average data set A
is more … than data set B.
Comparing spread:
Range/IQR/SD for data set A is larger than that of data set B so the ‘results’ of data set A are more spread
out/less consistent than those of data set B.
Data A has a smaller range/IQR/SD than data set B which means the ‘results’ for ‘data set A’ are more
consistent.
Remember lower SD means values are closer to the mean and therefore similar.
Comparing Skew:
Box Plot for data set A is positively skewed so majority of ‘results’ were low with few higher ‘results’.
Box plot for data set A is negatively skewed so majority of ‘results’ were high with few lower ‘results’.
When comparing data make sure to pair the appropriate values of average and spread.
Average Measure of Spread
Mode Range
Median Range/IQR
Mean Range/SD