Download Understanding and Comparing Distributions and more Schemes and Mind Maps Statistics in PDF only on Docsity!
Understanding and
Comparing Distributions
Chapter 4
Objectives:
- Boxplot
- Calculate Outliers
- Comparing Distributions
- Timeplot
The Big Picture (cont.)
- The distribution is unimodal and skewed to the right.
- The high value may be an outlier
Comparing
distributions can
be much more
interesting than
just describing a
single distribution.
The Five-Number Summary
of a distribution reports its
median, quartiles, and
extremes (maximum and
minimum).
summary for for the daily
wind speed is:
Max 8. Q3 2. Median 1. Q1 1. Min 0.
Daily Wind Speed: Making
Boxplots
- A boxplot is a graphical display of the five- number summary.
- Boxplots are useful when comparing groups.
- Boxplots are particularly good at pointing out outliers.
Boxplot
- A graph of the Five-Number Summary.
- Can be drawn either horizontally or vertically.
- The box represents the IQR (middle 50%) of the data.
- Show less detail than histograms or stemplots, they are best used for side-by-side comparison of more than one distribution.
Example Boxplot
- Data: 20, 25, 25, 27, 28, 31, 33, 34, 36, 37, 44, 50, 59, 85, 86
- Use TI-83/
What About Outliers?
- Recall that an outlier is an extremely small or extremely large data value when compared with the rest of the data values.
- What should we do about outliers?
- Try to understand them in the context of the data. - Data error - Special nature to the data
Testing for Outliers
- IQR is used to determine if extreme values are actually outliers
- An observation is an outlier if it falls more than 1.5 times IQR below Q 1 or above Q 3
- To test for outliers
- Construct an upper and lower fence
- Upper Fence = Q 3
+ (1.5)IQR
- Lower Fence = Q 1
– (1.5)IQR
- If an observation falls outside the fences (ie. Greater than the upper fence or less than the lower fence) than it is an outlier.
More Outliers
- Far Outlier – Data values farther than 3 IQRs from the quartiles.
Example 1: Odd number data set
- Data: 20, 25, 25, 27, 28, 31, 33, 34, 36, 37, 44, 50, 59, 85, 86 Find Q1, M, Q3, IQR and any outliers. - Sort data Q 1
Q
3 20 25 25 27 28 31 33 34 36 37 44 50 59 85 86 lower half median upper half
- IQR = 50 – 27 = 23
- Upper Fence = Q3 + (1.5)IQR = 50 + 34.5 = 84.
- Lower Fence = Q1 – (1.5)IQR = 27 – 34.5 = - 7.
- Outliers 85 and 86 (greater than the upper fence)
Example 2: Even number data set
- Data 5, 7, 10, 14, 18, 19, 25, 29, 31, 33 Find Q 1 , M, Q 3 , IQR and outliers. Q 1
Q
3 5 7 10 14 18 19 25 29 31 33 lower half upper half median
- IQR = 29 – 10 = 19
- Upper Fence = 29 + (1.5)IQR = 29 + 28.5 = 57.
- Lower Fence = 10 – (1.5)IQR = 10 – 28.5 = - 18.
- No Outliers
Solution
- The IQR = 39 – 17 = 22.
- Lower Fence, Q 1 – 1.5IQR = 17 – (1.522) = - 16.
- And Upper Fence, Q 3
+ 1.5IQR = 39 +
- Since, 101 > 72, the value of 101 is an outlier relative to the rest of the values in the data set.
- That is, the number of medals won by the United States is an outlier relative to the numbers won by the other 19 countries for the 1996 Atlanta Olympic Games.
Solution (cont.)
- Pictorial Representation for the OUTLIER of the Number of Olympic Medals Won by the United States in 1996 Atlanta Games. Lower Fence - 16 Upper Fence + 101 OUTLIER