6 One-Variable Data Analysis, Assignments of Statistics

In this chapter, you will learn various methods of statistical analysis and assess their strengths and weaknesses. Key Terms mean median mode outlier percentile.

Typology: Assignments

2021/2022

Uploaded on 07/05/2022

paul.kc
paul.kc 🇦🇺

4.7

(68)

1K documents

1 / 68

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
CHAPTER
6One-Variable Data Analysis
Athletes are often compared to one another. Team managers make generalizations about players’
abilities to decide what they are worth. What statistical calculations could help the manager
determine the value of a player’s contract?
decide which player to hire for their team?
decide how much money to pay a player for a five-year contract?
Popular musicians are compared with artists from the past and present. What statistical
calculations could you use to compare
the number of days on the top 10 charts?
the most listened to genre of music?
songs that have lasted the test of time?
In this chapter, you will learn various methods
of statistical analysis and assess their strengths
and weaknesses.
Key Terms
mean
median
mode
outlier
percentile
quartile
range
interquartile range (IQR)
variance
standard deviation
z-score
multiple bar graph
split bar graph
relative split bar graph
reliable data
valid data
Literacy Strategy
A mathematics word wall identifies words
and phrases that you need to understand to
develop mathematical skills and reasoning.
Careful use of visuals helps support
understanding of key words. How does this
image convey the meaning of
the word “median”?
As you work through this chapter, think
about some of the key words you encounter.
Use visuals and letters to show the meaning
of words for a classroom word wall.
median
Career Link
Risk Assessment Manager
In the financial services industry, many
careers involve assessing business
risks and taking measures to control
or reduce the risk. Victoria is a risk
management analyst who assesses the
risks related to defaults on loans. It is
her job to assess whether borrowers will
be able to repay the loans they receive
and whether her firm will get enough
return on investment.
Who do you think would be a lower
risk, a young employed couple buying
a house, or a startup company in the
information technology field?
How do you think risk level affects the
type of loan that each could get?
248 MHR • Chapter 6
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30
pf31
pf32
pf33
pf34
pf35
pf36
pf37
pf38
pf39
pf3a
pf3b
pf3c
pf3d
pf3e
pf3f
pf40
pf41
pf42
pf43
pf44

Partial preview of the text

Download 6 One-Variable Data Analysis and more Assignments Statistics in PDF only on Docsity!

CHAPTER

6 One-Variable Data Analysis

Athletes are often compared to one another. Team managers make generalizations about players’

abilities to decide what they are worth. What statistical calculations could help the manager

  • determine the value of a player’s contract?
  • decide which player to hire for their team?
  • decide how much money to pay a player for a five-year contract?

Popular musicians are compared with artists from the past and present. What statistical

calculations could you use to compare

  • the number of days on the top 10 charts?
  • the most listened to genre of music?
  • songs that have lasted the test of time?

In this chapter, you will learn various methods

of statistical analysis and assess their strengths

and weaknesses.

Key Terms mean median mode outlier percentile quartile range interquartile range (IQR)

variance standard deviation z -score multiple bar graph split bar graph relative split bar graph reliable data valid data

Literacy Strategy

A mathematics word wall identifies words

and phrases that you need to understand to

develop mathematical skills and reasoning.

Careful use of visuals helps support

understanding of key words. How does this

image convey the meaning of

the word “median”?

As you work through this chapter, think

about some of the key words you encounter.

Use visuals and letters to show the meaning

of words for a classroom word wall.

median

Career Link

Risk Assessment Manager

In the financial services industry, many

careers involve assessing business

risks and taking measures to control

or reduce the risk. Victoria is a risk

management analyst who assesses the

risks related to defaults on loans. It is

her job to assess whether borrowers will

be able to repay the loans they receive

and whether her firm will get enough

return on investment.

  • Who do you think would be a lower

risk, a young employed couple buying

a house, or a startup company in the

information technology field?

  • How do you think risk level affects the

type of loan that each could get?

248 MHR • Chapter 6

Chapter Problem

Used Car Lot Business Report

Retail sales outlets regularly use statistics to describe their products,

and used car lots are no exception. The amount of time a car sits,

unsold, on the lot can affect the value of the car. In this exercise, you

will analyse time spent on the lot and present a report of your findings.

a) List some pros and cons of buying a used car.

b) List three things car buyers look for when purchasing a used car.

c) A typical used car may take about a month to sell. What do you

think would happen to the price of a used car after it has been on

the lot for three months?

d) What other statistics will affect the price of the car?

Chapter 6 • MHR 249

Circle Graph

Comedy: 4

Favourite Type of Movie

Action: 5 Romance: 6 Drama: 1 Sci-fi: 4

25%

5%

30%

20% 20%

Example: To determine the central angle, multiply the percent of the data for each category by 360 degrees. For example, 25% of those surveyed preferred action movies. 0.25 × 360° = 90°

Histogram

20 40 60 80 100

40

30

20

10

0

Number of Students

Score on Final Exam (%)

Stem and Leaf Plot

Example: Create a stem and leaf plot for the following data: 170 173 173 175 179 179 179 181 181 182 183 187 188 188 189 Represent the hundreds and tens digits as the stem. Represent the ones digits as the leaves. Enter all values in numerical order. Stem Leaf 17 0 3 3 5 9 9 9 18 1 1 2 3 7 8 8 9

4. Match each description with the appropriate type of graph. Some graph types can be used more than once. Description a) Displays discrete data in separate columns. b) Organizes data by representing part of each number as a stem and the other part as a leaf. c) Uses pictures or symbols to represent data. d) Displays data as a percent of the whole. e) Uses proportional areas of the bars to show frequencies of the values of the variables. f) Represents data using a circle that has been divided into sectors. g) Represents nominal data, such as days of the week. 5. A survey asked grade 12 students to state their first choice for the university they would like to attend from the list given. University Frequency Queen’s University 33 University of Waterloo 32 University of Toronto 41 York University 35 University of Ottawa 40 McMaster University 31 University of Western Ontario 33

Illustrate the data using two graph types of your choice. Justify your choices.

6. Create a stem and leaf plot for each list of student averages. Compare with a histogram. a) 78, 83, 77, 73, 61, 99, 65, 80, 55, 67, 52, 79, 43, 59 b) 63.2, 71.4, 79.5, 50.0, 93.7, 44.5, 87.6, 65.7, 54.9, 92

Graph Bar graph Histogram Circle graph Stem and leaf plot Pictograph

Prerequisite Skills • MHR 251

6.

Measures of Central Tendency

Learning Goals

I am learning to

  • interpret the mean, median, and mode of a set of data
  • choose the measure of central tendency that best describes the data

Minds On…

It is important to collect and organize data in a way that helps you understand where the majority of the numbers are found. This clustering of data is referred to as measures of central tendency. The three most commonly used measures of central tendency are mean , median , and mode.

Think about the following situations:

  • A post-secondary admissions board uses students’ individual averages to decide who will be most successful in a particular field of study. How might a post-secondary admissions board analyse the cluster of applicants’ grades to help its decision-making process?
  • How might coaches of a sports team use averages to determine how well a player is performing in the season and how much money she should be paid?
  • How might economists use median household income to divide the country into equal income distribution groups?
  • How might shoe and clothing store managers use the mode to make decisions about which products to stock?

mean

  • the sum of the data entries divided by the number of entries

median

  • the middle value of all the data points when the data values are listed in order from least to greatest
  • if there is an even number of data points, then the median is the average between the two middle values

mode

  • the data value that occurs most often in the list of data points
  • it is possible to have no mode, one mode, or more than one mode

Action!

In statistics, you can find the mean of a population and the mean of a sample of that population. A sample mean will approximate the actual mean of the population.

Population Mean Sample Mean μ =

x ________________ 1 + x 2 + … + xN N

_

x =

x ________________ 1 + x 2 + … + xn n where N is the size of the population and n is the sample size.

Although the calculations are the same, different symbols are used to indicate whether it represents a population or a sample.

The Greek letter μ, pronounced “mu,” is used to represent the population mean.

_ x , read as “x-bar,” is used to represent the sample mean.

252 MHR • Chapter 6

The measures of central tendency of a data set can be affected by the presence of outliers.

  • In a symmetric distribution such as the uniform distribution, the mean, median, and mode will all be equal.
  • In a non-symmetric or skewed distribution, the mean, median, and mode will differ.
  • In a positively skewed distribution, the mode will be the lowest of the three values and the mean will be the highest.
  • In a negatively skewed distribution, the mode will be the highest of the three values and the mean will be the lowest. Mean Median Mode

x

Mode Median

Mean

x

Mode Median

Mean

x

Frequency

Negative direction (^) Perfectly Symmetric Distribution Positive direction

Example 1

Evaluating Measures of Central Tendency You are interviewing for an internship at a risk assessment firm to gain experience for your post-secondary program. The interviewer tells you that the average annual income of the 15 employees at the company is $73 518.27. The chart shows the actual incomes of the 15 employees.

$34 983 $18 980 $12 500 $48 980 $478 320 $17 305 $36 540 $12 500 $250 921 $32 654 $45 678 $33 855 $25 676 $33 450 $20 432

a) Determine the mean, median, and mode of the incomes. b) Use the measures of central tendency to decide whether the interviewer’s statement is accurate. c) What is the effect of the outliers on the measures of central tendency? d) Which measure of central tendency best represents the “average” income of the employees?

outlier

  • an element of the data set that is significantly different from the rest of the data points

15

30

20

10

0 5 10

Frequency

Data Value

Outlier

Recall that you studied uniform distributions in section 4.2. Why are the mean, median, and mode all equal in a uniform distribution?

254 MHR • Chapter 6

Solution

a) Method 1: Use Paper and Pencil

Determine the mean of the 15 income values.

μ =

∑_____ x n = _____________________________________________34 983 + 18 980 + 12 500 + 48 980 + … + 20 432 15

= 1 102 774_________ 15 ≈ 73 518. The mean income is $73 518.27.

There are an odd number of data points. To determine the median, place the values in order from least to greatest. Then, locate the 8th term because it is the middle of the 15 data points. $12 500 $12 500 $17 305 $18 980 $20 432 $25 676 $32 654 $33 450 $33 855 $34 983 $36 540 $45 678 $48 980 $250 921 $478 320 The median is $33 450.

The mode is $12 500 because it occurs twice in the set of data points. $12 500 $12 500 $17 305 $18 980 $20 432 $25 676 $32 654 $33450 $33 855 $34 983 $36 540 $45 678 $48 980 $250 921 $478 320

Method 2: Use a Graphing Calculator

  • Press STAT , then 1:Edit…. Enter the salaries in list L.
  • Press 2ND QUIT.
  • Press STAT. Use the arrow keys to choose CALC , then 1:1-Var Stats. Press ENTER.
  • Scroll down to read the mean and median, along with numerous other statistics. The calculator will not identify the mode.

The mean income,

_

x , is $73 518.27, and the median income, Med, is $33 450. You can see the mode of $12 500 by inspecting the table.

Recall that the Greek letter Σ, pronounced “sigma,” is used to represent the sum of a series of numbers. ∑ x means x 1 + x 2 + x 3 + … + xn.

6.1 Measures of Central Tendency • MHR 255

d) The mean is significantly affected by the outliers, the mode is the smallest income of all the employees, and the median is the middle income of all the employees. The median would be the best measure to represent the average income. Looking at the data, this median appears to be fairly close to many data points and provides a description of the data without taking the outliers or repeated values into effect.

Your Turn

Before heading on vacation to Mexico, you observe the actual high temperatures for seven days. The table shows the temperatures.

Day Temperature (°C) 1 27 2 29 3 32 4 29 5 45 6 29 7 31

a) Determine the mean, median, and mode of the temperatures. b) The weather report predicts that based on the previous seven-day forecast, the temperature on the day of your arrival should be 36 °C. Use the measures of central tendency in part a) to determine whether the weather report is accurate. c) Is there an outlier in the data? How does it affect the measures of central tendency? d) Which measure of central tendency would best represent the temperatures in this Mexican location? Explain.

When the quantity of data is large, you can group the data into intervals to make them easier to analyse. When data are grouped into intervals, you can only approximate the centre of the data. To do this, assume that the data are evenly spaced in each interval, and use the midpoint to represent the values in each interval. Multiply the data values by their respective frequencies. Then, add these products and divide by the total frequency. You can use the following formula to approximate the mean for grouped data.

Mean for Grouped Data _ x = _______∑ fi^ mifi where mi is the midpoint of each interval and fi is the frequency of each interval.

6.1 Measures of Central Tendency • MHR 257

You can use a frequency distribution table to help organize your data.

Example 2

Using Grouped Data The time taken to complete a chess game was recorded, to the nearest minute. The frequency table shows the data.

Time (min) 10–15 15–20 20–25 25–30 30– Frequency 2 20 18 10 5

a) Calculate the estimated mean, median, and mode times, in minutes, to complete a chess game. b) Describe potential issues with finding the measures of central tendency of grouped data. c) Graph the data using a histogram. Mark the measures of central tendency on the graph. d) Discuss any skewing of the data with respect to the measures of central tendency.

Solution

a) Method 1: Use Paper and Pencil Use the following table. Number of Minutes Midpoint, mi

Number of Games, fi m (^) i fi

Cumulative Frequency 10–15 12.5 2 25 2 15–20 17.5 20 350 22 20–25 22.5 18 405 40 25–30 27.5 10 275 50 30–35 32.5 5 162.5 55

fi = 2 + 20 + 18 + 10 + 5 = 55 ∑ mifi = 25 + 350 + 405 + 275 + 162. = 1217.

Calculate the grouped mean. _ x = _______∑ mi f^ if (^) i

= _______1217.5 55 ≈ 22. Therefore, the mean number of minutes is approximately 22 min per game.

258 MHR • Chapter 6

c) Enter the data into your spreadsheet as shown. Highlight the frequency column, select the Insert tab, and then choose Column, Clustered Column from the drop-down menu.

Right click on the x -axis and choose Select Data…. Under the options for Horizontal (Category) Axis Labels , select Edit and highlight the data for your time intervals from A2:A6. Change the title of your graph to Frequency of Chess Game Length. With your graph selected, click on the Layout tab and choose Axis Titles , then choose Primary Horizontal Axis Title – Title Below Axis. Rename the axis title Time (min).

d) Since the data are positively skewed, the modal interval is the least appropriate measure of central tendency. The median and mean are very close together, so either one would be an appropriate measure.

Your Turn

A group of children were asked how many hours a day they spend playing video games. The table shows the data. a) Determine the estimated mean, median number of hours, and modal interval for the above distribution. b) Discuss any skewing of the data with respect to the measures of central tendency.

To eliminate the gap between bars, right click on any of the bars and choose Format Data Series. Reduce the Gap Width to 0% and click Close.

Number of Hours

Number of Children 0–2 3 2–4 11 4–6 7 6–8 2 8–10 1

260 MHR • Chapter 6

Certain values in a data set are sometimes of greater relative importance than others. In these cases, it is useful to calculate a weighted mean. To do this, multiply the weighting by the corresponding data value, find the sum of these products, and then divide by the total weighting.

Weighted Mean

μ =

∑_______ xi wiwi Where xi represents each data value in the data set and wi represents its weight or frequency.

Example 3

Using a Weighted Mean A teacher is calculating the marks for the students in her Data Management class. She assigns the following values to each category: Knowledge: 25% Thinking: 10% Application: 20% Culminating Project: 15% Communication: 15% Final Exam: 15% Kyle has not yet written his final exam, but his marks in the first five categories are 90, 79, 82, 70, and 85. a) Determine the weighted mean for Kyle before the final exam. b) How does this weighted mean differ from the unweighted mean? c) What mark must Kyle receive on the final exam to finish the course with 84%?

Solution

a) Calculate the weighted mean:

w =^90

________________________________________________(^ 0.25^ )^ + 79^ (^ 0.20^ )^ + 82^ (^ 0.15^ )^ + 70^ (^ 0.10^ )^ + 85^ (^ 0.15^ )

= ______70.350.

b) Calculate the unweighted mean:

 = 90 + 79 + 82 + 70 + 85_____________________ 5

=^406 ____ 5 = 81. Without the weighted mean, Kyle would receive a lower average mark because the categories he excels in are worth more when calculated using the weighted mean.

6.1 Measures of Central Tendency • MHR 261

Reflect R1. Which measure of central tendency is most affected by extreme values? Explain using specific examples to justify your answer.

R2. Describe a situation in which it would be necessary for you to use a) the mean b) the weighted mean c) grouped data

R3. Which measure of central tendency is being used in each situation? Explain. a) The average person has two hands, two eyes, two ears, and two legs. b) The average time it takes to get to school is 38 min. c) Johnny is an above average student.

Practise

Choose the best answer for #3 and #4.

1. Determine the mean, median, and mode for each set of data. a) 4 6 9 12 15 7 13 4 7 10 3 8 15 b) 9 8 20 23 12 12 9 9 12 9 20 21 9 c) 110 152 112 124 110 134 138 127 118 110 114 162 2. Nina runs the 400-m race for Mustang High School. Her times in the last six track meets were 1.45 min, 1.50 min, 1.42 min, 1.41 min, 1.42 min, and 1.48 min. a) What are the mean and median for her running times? b) Which measure of central tendency best describes Nina’s average time? Explain. 3. The observation that occurs most frequently in a data sample is the A mean B weighted mean C mode D median 4. What is the median of the sample 5, 5, 11, 9, 8, 5, 8? A 9 B 6 C 5 D 8

Apply

5. The mean of Daniel’s marks on five tests was 77.4. His marks on the first four tests were 88, 77, 70, and 72. Calculate Daniel’s mark on the fifth test. 6. The average daily snowfall for the first week of December was 2.5 cm. In each of the first two days, 2.5 cm fell. In each of the next four days, 2 cm fell. What was the snowfall for the last day of the week? 7. Communication Determine whether the argument is valid for each situation. Explain your thinking. a) An advertising company has a mean monthly sales record of $16 235. Therefore, half the team members sold more than $16 235. b) A survey shows that 78% of all salaries are below the mean. Therefore, there must be a mistake. c) The mean mark of one class is 71, while the mean mark of another class is 76. Therefore, the mean of the two classes is 73.5. d) My median monthly expenses total $850. Therefore, my total expenses for the year must be $10 200.

6.1 Measures of Central Tendency • MHR 263

8. Communication Which measure of central tendency would be best suited for each situation? Explain why you chose the measure that you did. a) a summary of a class’s report card marks b) an award for the most popular movie of the year c) an employer budgeting for the average salary of its employees d) a potential employee looking for the typical salary among current employees 9. Thinking Create a data set of at least seven values that satisfies each of the conditions. Use the context of marks, salaries, sports statistics, or choose a context of your own. a) The mean, median, and mode are all 15. b) The median is 7.5 and the mean is greater than 15. c) The mean is 7.5 and the median is greater than 15. d) Explain why the mean is more affected by outliers than the median. 10. Michael surveyed the grade 12 students at his school to research the number of hours of sleep they got. He asked them how many hours of sleep they got last night. The table shows his results. Time (h) 4–5 5–6 6–7 7–8 8– Frequency 32 50 125 67 108 a) Make a histogram of these data. b) Estimate the mean, median, and modal interval for the hours of sleep by grade 12 students. c) Mark the measures of central tendency on your histogram. d) Discuss any skewing of the data and how it relates to the measures of central tendency. 11. Application Your teacher will provide you with a file called Nobel Winners.csv , listing the Canadian or Canadian-born Nobel Prize winners up to 2013. Use appropriate technology to answer the questions.

a) Make a histogram of the winners’ ages. b) Calculate the mean, median, and mode ages. c) Describe the “average” age of a Canadian Nobel Prize winner. Explain why this age would not be younger.

12. Application Your teacher will provide you with a file called Olympics 2014.csv , listing the medal counts and populations of winning countries in the 2014 Sochi Winter Olympics. The file ranks countries by number of gold medals, followed by silver and bronze. Use appropriate technology to answer the questions. a) Re-rank the countries by total medals relative to the population. b) Re-rank the countries using a weighted mean, with each type of medal having a different weighting. c) Which system do you prefer? Write a paragraph supporting your choice. 13. Karen’s term mark is 82%. The term counts for 70% of the final mark. What mark must Karen achieve on the exam to earn a final mark of a) 80%? b) 85%? c) at least 75%? d) Can Karen achieve 88%? Explain. 14. Thinking Using the data provided in the frequency table, describe a context that the mean, median, and mode could represent. Age in Years 20–30 30–40 40–50 50–60 60– Frequency 14 15 28 19 5

264 MHR • Chapter 6

6.

Measures of Spread

Learning Goals

I am learning to

  • describe the variability in a sample or population using measures of spread
  • calculate the range
  • understand how to use quartiles and percentiles to analyse data

Minds On…

The fuel economy of various cars in a given class will vary significantly. Cars can be rated anywhere from 4 L/100 km to 9 L/100 km. Understanding where most of the data occur using the measures of central tendency is sometimes not enough to make a broad generalization of the data. Why is it also important to know the distance associated with each data value and the centre of the data? The measure of how widely the data vary around their central location is referred to as the measure of spread or dispersion.

Action!

Investigate Percentiles

An automotive publication ranked the following 2013 subcompact cars based on their fuel economy in the city. The chart shows the cars’ fuel economy in the city and on the highway. Your teacher will provide you with the data in a file called Fuel Economy.csv.

Make/Model

L/100 km City Hwy TOYOTA PRIUS C 3.5 4. SCION iQ 5.5 4. SMART FORTWO 5.8 4. CHEVROLET SPARK 6.3 5. FIAT 500 6.4 4. TOYOTA YARIS 6.7 5. MINI COOPER 6.8 5. KIA RIO 6.8 5. FORD FIESTA 6.9 5. HONDA FIT 7.1 5. MAZDA 2 7.1 5. HYUNDAI ACCENT 7.2 5. SCION xD 7.4 5. NISSAN VERSA 7.4 5. CHEVROLET SONIC 7.7 5. VOLKSWAGEN BEETLE 9.5 7. SUBARU BRZ 9.6 6. Source: “Fuel Consumption Ratings,” Natural Resources Canada, February 5, 2014.

tl

266 MHR • Chapter 6

1. a) Identify the car with the median city fuel economy in city driving. b) The median is also called the 50th percentile. Why would that be? 2. The 80th percentile would be the car whose fuel economy in city driving is greater than 80% of all others. Which car would that be? 3. What percent of the cars have a better fuel economy than a Fiat? 4. a) Identify the car with the best fuel economy in the city and the car with the best fuel economy on the highway. What percent of the cars have a worse fuel economy? b) Identify the car with the worst fuel economy in the city and the car with the worst fuel economy on the highway. What percent of the cars have a worse fuel economy? 5. Reflect Describe how you can use percentiles to compare data. 6. Extend Your Understanding If you were comparing your marks to those of your classmates, would you rather be in the 95th percentile or the 5th percentile? Explain.

To help analyse the spread of data, you may need to identify the percentile rank or calculate percentiles.

Percentile Rank R = ____ 100 p ( n + 1) where p is the percentile, n is the size of the population, and R is the whole number rank of the data point. If R is not a whole number, round R down.

Percentile p = 100

(__________ L + 0.5 E )

n where p is the percentile, L is the number of data less than the data point, E is the number of data equal to the data point, and n is the size of the population.

percentile

  • the percent of all the data that are less than or equal to a specific data value

6.2 Measures of Spread • MHR 267