




























































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
In this chapter, you will learn various methods of statistical analysis and assess their strengths and weaknesses. Key Terms mean median mode outlier percentile.
Typology: Assignments
1 / 68
This page cannot be seen from the preview
Don't miss anything!





























































Key Terms mean median mode outlier percentile quartile range interquartile range (IQR)
variance standard deviation z -score multiple bar graph split bar graph relative split bar graph reliable data valid data
Literacy Strategy
median
Career Link
248 MHR • Chapter 6
Chapter Problem
Chapter 6 • MHR 249
Circle Graph
Comedy: 4
Favourite Type of Movie
Action: 5 Romance: 6 Drama: 1 Sci-fi: 4
25%
5%
30%
20% 20%
Example: To determine the central angle, multiply the percent of the data for each category by 360 degrees. For example, 25% of those surveyed preferred action movies. 0.25 × 360° = 90°
Histogram
20 40 60 80 100
40
30
20
10
0
Number of Students
Score on Final Exam (%)
Stem and Leaf Plot
Example: Create a stem and leaf plot for the following data: 170 173 173 175 179 179 179 181 181 182 183 187 188 188 189 Represent the hundreds and tens digits as the stem. Represent the ones digits as the leaves. Enter all values in numerical order. Stem Leaf 17 0 3 3 5 9 9 9 18 1 1 2 3 7 8 8 9
4. Match each description with the appropriate type of graph. Some graph types can be used more than once. Description a) Displays discrete data in separate columns. b) Organizes data by representing part of each number as a stem and the other part as a leaf. c) Uses pictures or symbols to represent data. d) Displays data as a percent of the whole. e) Uses proportional areas of the bars to show frequencies of the values of the variables. f) Represents data using a circle that has been divided into sectors. g) Represents nominal data, such as days of the week. 5. A survey asked grade 12 students to state their first choice for the university they would like to attend from the list given. University Frequency Queen’s University 33 University of Waterloo 32 University of Toronto 41 York University 35 University of Ottawa 40 McMaster University 31 University of Western Ontario 33
Illustrate the data using two graph types of your choice. Justify your choices.
6. Create a stem and leaf plot for each list of student averages. Compare with a histogram. a) 78, 83, 77, 73, 61, 99, 65, 80, 55, 67, 52, 79, 43, 59 b) 63.2, 71.4, 79.5, 50.0, 93.7, 44.5, 87.6, 65.7, 54.9, 92
Graph Bar graph Histogram Circle graph Stem and leaf plot Pictograph
Prerequisite Skills • MHR 251
6.
Measures of Central Tendency
Learning Goals
I am learning to
It is important to collect and organize data in a way that helps you understand where the majority of the numbers are found. This clustering of data is referred to as measures of central tendency. The three most commonly used measures of central tendency are mean , median , and mode.
Think about the following situations:
mean
median
mode
In statistics, you can find the mean of a population and the mean of a sample of that population. A sample mean will approximate the actual mean of the population.
Population Mean Sample Mean μ =
x ________________ 1 + x 2 + … + xN N
x =
x ________________ 1 + x 2 + … + xn n where N is the size of the population and n is the sample size.
Although the calculations are the same, different symbols are used to indicate whether it represents a population or a sample.
The Greek letter μ, pronounced “mu,” is used to represent the population mean.
_ x , read as “x-bar,” is used to represent the sample mean.
252 MHR • Chapter 6
The measures of central tendency of a data set can be affected by the presence of outliers.
x
Mode Median
Mean
x
Mode Median
Mean
x
Frequency
Negative direction (^) Perfectly Symmetric Distribution Positive direction
Example 1
Evaluating Measures of Central Tendency You are interviewing for an internship at a risk assessment firm to gain experience for your post-secondary program. The interviewer tells you that the average annual income of the 15 employees at the company is $73 518.27. The chart shows the actual incomes of the 15 employees.
$34 983 $18 980 $12 500 $48 980 $478 320 $17 305 $36 540 $12 500 $250 921 $32 654 $45 678 $33 855 $25 676 $33 450 $20 432
a) Determine the mean, median, and mode of the incomes. b) Use the measures of central tendency to decide whether the interviewer’s statement is accurate. c) What is the effect of the outliers on the measures of central tendency? d) Which measure of central tendency best represents the “average” income of the employees?
outlier
15
30
20
10
0 5 10
Frequency
Data Value
Outlier
Recall that you studied uniform distributions in section 4.2. Why are the mean, median, and mode all equal in a uniform distribution?
254 MHR • Chapter 6
a) Method 1: Use Paper and Pencil
Determine the mean of the 15 income values.
μ =
∑_____ x n = _____________________________________________34 983 + 18 980 + 12 500 + 48 980 + … + 20 432 15
= 1 102 774_________ 15 ≈ 73 518. The mean income is $73 518.27.
There are an odd number of data points. To determine the median, place the values in order from least to greatest. Then, locate the 8th term because it is the middle of the 15 data points. $12 500 $12 500 $17 305 $18 980 $20 432 $25 676 $32 654 $33 450 $33 855 $34 983 $36 540 $45 678 $48 980 $250 921 $478 320 The median is $33 450.
The mode is $12 500 because it occurs twice in the set of data points. $12 500 $12 500 $17 305 $18 980 $20 432 $25 676 $32 654 $33450 $33 855 $34 983 $36 540 $45 678 $48 980 $250 921 $478 320
Method 2: Use a Graphing Calculator
The mean income,
x , is $73 518.27, and the median income, Med, is $33 450. You can see the mode of $12 500 by inspecting the table.
Recall that the Greek letter Σ, pronounced “sigma,” is used to represent the sum of a series of numbers. ∑ x means x 1 + x 2 + x 3 + … + xn.
6.1 Measures of Central Tendency • MHR 255
d) The mean is significantly affected by the outliers, the mode is the smallest income of all the employees, and the median is the middle income of all the employees. The median would be the best measure to represent the average income. Looking at the data, this median appears to be fairly close to many data points and provides a description of the data without taking the outliers or repeated values into effect.
Before heading on vacation to Mexico, you observe the actual high temperatures for seven days. The table shows the temperatures.
Day Temperature (°C) 1 27 2 29 3 32 4 29 5 45 6 29 7 31
a) Determine the mean, median, and mode of the temperatures. b) The weather report predicts that based on the previous seven-day forecast, the temperature on the day of your arrival should be 36 °C. Use the measures of central tendency in part a) to determine whether the weather report is accurate. c) Is there an outlier in the data? How does it affect the measures of central tendency? d) Which measure of central tendency would best represent the temperatures in this Mexican location? Explain.
When the quantity of data is large, you can group the data into intervals to make them easier to analyse. When data are grouped into intervals, you can only approximate the centre of the data. To do this, assume that the data are evenly spaced in each interval, and use the midpoint to represent the values in each interval. Multiply the data values by their respective frequencies. Then, add these products and divide by the total frequency. You can use the following formula to approximate the mean for grouped data.
Mean for Grouped Data _ x = _______∑ fi^ mi ∑ fi where mi is the midpoint of each interval and fi is the frequency of each interval.
6.1 Measures of Central Tendency • MHR 257
You can use a frequency distribution table to help organize your data.
Example 2
Using Grouped Data The time taken to complete a chess game was recorded, to the nearest minute. The frequency table shows the data.
Time (min) 10–15 15–20 20–25 25–30 30– Frequency 2 20 18 10 5
a) Calculate the estimated mean, median, and mode times, in minutes, to complete a chess game. b) Describe potential issues with finding the measures of central tendency of grouped data. c) Graph the data using a histogram. Mark the measures of central tendency on the graph. d) Discuss any skewing of the data with respect to the measures of central tendency.
a) Method 1: Use Paper and Pencil Use the following table. Number of Minutes Midpoint, mi
Number of Games, fi m (^) i fi
Cumulative Frequency 10–15 12.5 2 25 2 15–20 17.5 20 350 22 20–25 22.5 18 405 40 25–30 27.5 10 275 50 30–35 32.5 5 162.5 55
∑ fi = 2 + 20 + 18 + 10 + 5 = 55 ∑ mifi = 25 + 350 + 405 + 275 + 162. = 1217.
Calculate the grouped mean. _ x = _______∑ mi f^ i ∑ f (^) i
= _______1217.5 55 ≈ 22. Therefore, the mean number of minutes is approximately 22 min per game.
258 MHR • Chapter 6
c) Enter the data into your spreadsheet as shown. Highlight the frequency column, select the Insert tab, and then choose Column, Clustered Column from the drop-down menu.
Right click on the x -axis and choose Select Data…. Under the options for Horizontal (Category) Axis Labels , select Edit and highlight the data for your time intervals from A2:A6. Change the title of your graph to Frequency of Chess Game Length. With your graph selected, click on the Layout tab and choose Axis Titles , then choose Primary Horizontal Axis Title – Title Below Axis. Rename the axis title Time (min).
d) Since the data are positively skewed, the modal interval is the least appropriate measure of central tendency. The median and mean are very close together, so either one would be an appropriate measure.
A group of children were asked how many hours a day they spend playing video games. The table shows the data. a) Determine the estimated mean, median number of hours, and modal interval for the above distribution. b) Discuss any skewing of the data with respect to the measures of central tendency.
To eliminate the gap between bars, right click on any of the bars and choose Format Data Series. Reduce the Gap Width to 0% and click Close.
Number of Hours
Number of Children 0–2 3 2–4 11 4–6 7 6–8 2 8–10 1
260 MHR • Chapter 6
Certain values in a data set are sometimes of greater relative importance than others. In these cases, it is useful to calculate a weighted mean. To do this, multiply the weighting by the corresponding data value, find the sum of these products, and then divide by the total weighting.
Weighted Mean
μ =
∑_______ xi wi ∑ wi Where xi represents each data value in the data set and wi represents its weight or frequency.
Example 3
Using a Weighted Mean A teacher is calculating the marks for the students in her Data Management class. She assigns the following values to each category: Knowledge: 25% Thinking: 10% Application: 20% Culminating Project: 15% Communication: 15% Final Exam: 15% Kyle has not yet written his final exam, but his marks in the first five categories are 90, 79, 82, 70, and 85. a) Determine the weighted mean for Kyle before the final exam. b) How does this weighted mean differ from the unweighted mean? c) What mark must Kyle receive on the final exam to finish the course with 84%?
a) Calculate the weighted mean:
w =^90
b) Calculate the unweighted mean:
= 90 + 79 + 82 + 70 + 85_____________________ 5
=^406 ____ 5 = 81. Without the weighted mean, Kyle would receive a lower average mark because the categories he excels in are worth more when calculated using the weighted mean.
6.1 Measures of Central Tendency • MHR 261
Reflect R1. Which measure of central tendency is most affected by extreme values? Explain using specific examples to justify your answer.
R2. Describe a situation in which it would be necessary for you to use a) the mean b) the weighted mean c) grouped data
R3. Which measure of central tendency is being used in each situation? Explain. a) The average person has two hands, two eyes, two ears, and two legs. b) The average time it takes to get to school is 38 min. c) Johnny is an above average student.
Practise
1. Determine the mean, median, and mode for each set of data. a) 4 6 9 12 15 7 13 4 7 10 3 8 15 b) 9 8 20 23 12 12 9 9 12 9 20 21 9 c) 110 152 112 124 110 134 138 127 118 110 114 162 2. Nina runs the 400-m race for Mustang High School. Her times in the last six track meets were 1.45 min, 1.50 min, 1.42 min, 1.41 min, 1.42 min, and 1.48 min. a) What are the mean and median for her running times? b) Which measure of central tendency best describes Nina’s average time? Explain. 3. The observation that occurs most frequently in a data sample is the A mean B weighted mean C mode D median 4. What is the median of the sample 5, 5, 11, 9, 8, 5, 8? A 9 B 6 C 5 D 8
Apply
5. The mean of Daniel’s marks on five tests was 77.4. His marks on the first four tests were 88, 77, 70, and 72. Calculate Daniel’s mark on the fifth test. 6. The average daily snowfall for the first week of December was 2.5 cm. In each of the first two days, 2.5 cm fell. In each of the next four days, 2 cm fell. What was the snowfall for the last day of the week? 7. Communication Determine whether the argument is valid for each situation. Explain your thinking. a) An advertising company has a mean monthly sales record of $16 235. Therefore, half the team members sold more than $16 235. b) A survey shows that 78% of all salaries are below the mean. Therefore, there must be a mistake. c) The mean mark of one class is 71, while the mean mark of another class is 76. Therefore, the mean of the two classes is 73.5. d) My median monthly expenses total $850. Therefore, my total expenses for the year must be $10 200.
6.1 Measures of Central Tendency • MHR 263
8. Communication Which measure of central tendency would be best suited for each situation? Explain why you chose the measure that you did. a) a summary of a class’s report card marks b) an award for the most popular movie of the year c) an employer budgeting for the average salary of its employees d) a potential employee looking for the typical salary among current employees 9. Thinking Create a data set of at least seven values that satisfies each of the conditions. Use the context of marks, salaries, sports statistics, or choose a context of your own. a) The mean, median, and mode are all 15. b) The median is 7.5 and the mean is greater than 15. c) The mean is 7.5 and the median is greater than 15. d) Explain why the mean is more affected by outliers than the median. 10. Michael surveyed the grade 12 students at his school to research the number of hours of sleep they got. He asked them how many hours of sleep they got last night. The table shows his results. Time (h) 4–5 5–6 6–7 7–8 8– Frequency 32 50 125 67 108 a) Make a histogram of these data. b) Estimate the mean, median, and modal interval for the hours of sleep by grade 12 students. c) Mark the measures of central tendency on your histogram. d) Discuss any skewing of the data and how it relates to the measures of central tendency. 11. Application Your teacher will provide you with a file called Nobel Winners.csv , listing the Canadian or Canadian-born Nobel Prize winners up to 2013. Use appropriate technology to answer the questions.
a) Make a histogram of the winners’ ages. b) Calculate the mean, median, and mode ages. c) Describe the “average” age of a Canadian Nobel Prize winner. Explain why this age would not be younger.
12. Application Your teacher will provide you with a file called Olympics 2014.csv , listing the medal counts and populations of winning countries in the 2014 Sochi Winter Olympics. The file ranks countries by number of gold medals, followed by silver and bronze. Use appropriate technology to answer the questions. a) Re-rank the countries by total medals relative to the population. b) Re-rank the countries using a weighted mean, with each type of medal having a different weighting. c) Which system do you prefer? Write a paragraph supporting your choice. 13. Karen’s term mark is 82%. The term counts for 70% of the final mark. What mark must Karen achieve on the exam to earn a final mark of a) 80%? b) 85%? c) at least 75%? d) Can Karen achieve 88%? Explain. 14. Thinking Using the data provided in the frequency table, describe a context that the mean, median, and mode could represent. Age in Years 20–30 30–40 40–50 50–60 60– Frequency 14 15 28 19 5
264 MHR • Chapter 6
6.
Measures of Spread
Learning Goals
I am learning to
The fuel economy of various cars in a given class will vary significantly. Cars can be rated anywhere from 4 L/100 km to 9 L/100 km. Understanding where most of the data occur using the measures of central tendency is sometimes not enough to make a broad generalization of the data. Why is it also important to know the distance associated with each data value and the centre of the data? The measure of how widely the data vary around their central location is referred to as the measure of spread or dispersion.
An automotive publication ranked the following 2013 subcompact cars based on their fuel economy in the city. The chart shows the cars’ fuel economy in the city and on the highway. Your teacher will provide you with the data in a file called Fuel Economy.csv.
Make/Model
L/100 km City Hwy TOYOTA PRIUS C 3.5 4. SCION iQ 5.5 4. SMART FORTWO 5.8 4. CHEVROLET SPARK 6.3 5. FIAT 500 6.4 4. TOYOTA YARIS 6.7 5. MINI COOPER 6.8 5. KIA RIO 6.8 5. FORD FIESTA 6.9 5. HONDA FIT 7.1 5. MAZDA 2 7.1 5. HYUNDAI ACCENT 7.2 5. SCION xD 7.4 5. NISSAN VERSA 7.4 5. CHEVROLET SONIC 7.7 5. VOLKSWAGEN BEETLE 9.5 7. SUBARU BRZ 9.6 6. Source: “Fuel Consumption Ratings,” Natural Resources Canada, February 5, 2014.
tl
266 MHR • Chapter 6
1. a) Identify the car with the median city fuel economy in city driving. b) The median is also called the 50th percentile. Why would that be? 2. The 80th percentile would be the car whose fuel economy in city driving is greater than 80% of all others. Which car would that be? 3. What percent of the cars have a better fuel economy than a Fiat? 4. a) Identify the car with the best fuel economy in the city and the car with the best fuel economy on the highway. What percent of the cars have a worse fuel economy? b) Identify the car with the worst fuel economy in the city and the car with the worst fuel economy on the highway. What percent of the cars have a worse fuel economy? 5. Reflect Describe how you can use percentiles to compare data. 6. Extend Your Understanding If you were comparing your marks to those of your classmates, would you rather be in the 95th percentile or the 5th percentile? Explain.
To help analyse the spread of data, you may need to identify the percentile rank or calculate percentiles.
Percentile Rank R = ____ 100 p ( n + 1) where p is the percentile, n is the size of the population, and R is the whole number rank of the data point. If R is not a whole number, round R down.
Percentile p = 100
n where p is the percentile, L is the number of data less than the data point, E is the number of data equal to the data point, and n is the size of the population.
percentile
6.2 Measures of Spread • MHR 267