


































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Instructions on creating histograms using spreadsheets such as openoffice.org calc and microsoft excel. It covers the process of selecting columns, setting labels, and adjusting chart settings to create effective histograms.
Typology: Exercises
1 / 42
This page cannot be seen from the preview
Don't miss anything!



































Source URL: http://www.comfsm.fm/~dleeling/statistics/text.html#page 021 Saylor URL: http://saylor.org/courses/bus Attributed to: [Dana Lee Ling] Saylor.org
2.1 Graphs and Charts The table below includes FSM census 2000 data and student seat numbers for the national site of COM-‐FSM circa 2004. State Population (2000) Fractional share of national population (relative frequency) Number of student seats held by state at the national campus Fractional share of the national campus student seats Chuuk 53595 0.5 679 0. Kosrae 7686 0.07 316 0. Pohnpei 34486 0.32 2122 0. Yap 11241 0.11 287 0. 107008 1 3404 1 Circle or pie charts In a circle chart the whole circle is 100% Used when data adds to a whole, e.g. state populations add to yield national population. A pie chart of the state populations:
Source URL: http://www.comfsm.fm/~dleeling/statistics/text.html#page 021 Saylor URL: http://saylor.org/courses/bus Attributed to: [Dana Lee Ling] Saylor.org The following table includes data from the 2010 FSM census as an update to the above data. State Population (2010) Relative frequency Chuuk 48651 Kosrae 6616 Pohnpei 35981 Yap 11376 Sum: 102624 Column charts Column charts are also called bar graphs. A column chart of the student seats held by each state at the national site:
Source URL: http://www.comfsm.fm/~dleeling/statistics/text.html#page 021 Saylor URL: http://saylor.org/courses/bus Attributed to: [Dana Lee Ling] Saylor.org Line graph A line graph is a chart which plots data as a line. The horizontal axis is usually set up with equal intervals. Line graphs are not used in this course and should not be confused with xy scattergraphs. XY Scatter graph When you have two sets of continuous data (value versus value, no categories), use an xy graph. These will be covered in more detail in the chapter on linear regressions. 2.2 Histograms and Frequency Distributions A distribution counts the number of elements of data in either a category or within a range of values. Plotting the count of the elements in each category or range as a column chart generates a chart called a histogram.
Source URL: http://www.comfsm.fm/~dleeling/statistics/text.html#page 021 Saylor URL: http://saylor.org/courses/bus Attributed to: [Dana Lee Ling] Saylor.org The histogram shows the distribution of the data. The height of each column shows the frequency of an event. This distribution often provides insight into the data that the data itself does not reveal. In the histogram below, the distribution for male body fat among statistics students has two peaks. The two peaks suggest that there are two subgroups among the men in the statistics course, one subgroup that is at a healthy level of body fat and a second subgroup at a higher level of body fat. The ranges into which values are gathered are called bins, classes, or intervals. This text tends to use classes or bins to describe the ranges into which the data values are grouped. Nominal level of measurement At the nominal level of measurement one can determine the frequency of elements in a category, such as students by state in a statistics course. State Frequency Rel Freq Chuuk 6 0. Kosrae 6 0. Pohnpei 31 0.
Source URL: http://www.comfsm.fm/~dleeling/statistics/text.html#page 021 Saylor URL: http://saylor.org/courses/bus Attributed to: [Dana Lee Ling] Saylor.org Age Frequency Rel Freq 25 1 0. 26 1 0. 27 1 0. sums 51 1 Data gathered into a number of classes fewer than the number of unique data values The ranks can be collected together, classed, to reduce the number of rank order categories. in the example below the age data in gathered into two-‐ year cohorts. Age Frequency Rel Freq 19 20 0. 21 21 0. 23 4 0. 25 4 0. 27 2 0. Sums: 51 1
Source URL: http://www.comfsm.fm/~dleeling/statistics/text.html#page 021 Saylor URL: http://saylor.org/courses/bus Attributed to: [Dana Lee Ling] Saylor.org 2.22 Ratio level of measurement At the ratio level data is always gathered into ranges. At the ratio level, classed histograms are used. Ratio level data is not necessarily in a finite number of ranks as was ordinal data. The ranges into which data is gathered are defined by a class lower limit and a class upper limit. The width is the class upper limit minus the class lower limit. The frequency function in spreadsheets uses class upper limits. In this text histograms are also generated using the class upper limits. To calculate the class lower and upper limits the minimum and maximum value in a data set must be determined. Spreadsheets include functions to calculate the minimum value MIN and maximum value MAX in a data set. =MIN(data) =MAX(data) In LibreOffice the MIN and MAX function can take a list of comma-‐ separated numbers or a range of cells in a spreadsheet. In statistics a range of cells is the most common input for these functions. When a range of cells is the usual input, this text uses the word "data" to refer to the fact that the range of cells is usually your data! Ranges of cells use two cell addresses
Source URL: http://www.comfsm.fm/~dleeling/statistics/text.html#page 021 Saylor URL: http://saylor.org/courses/bus Attributed to: [Dana Lee Ling] Saylor.org Min = 58 Max = 66 Range = 66 -‐ 58 = 8 Width = 8/5 = 1. Calculation Height (CUL) Frequency 58 + 1.6 59.6 4 59.6 + 1.6 61.2 8 61.2 + 1.6 62.8 13 62.8 + 1.6 64.4 8 64.4 + 1.6 66 4 Sum: 37 Note that 61.2 is INCLUDED in the class that ends at 61.2. The class includes values at the class upper limit. In other words, a class includes all values up to and including the class upper limit. Note too that the frequencies add to the sample size. After making the column chart, double click on the columns to open the data series dialog box. Find the Options tab and set the spacing (or gap width) to zero.
Source URL: http://www.comfsm.fm/~dleeling/statistics/text.html#page 021 Saylor URL: http://saylor.org/courses/bus Attributed to: [Dana Lee Ling] Saylor.org Note that the spacing or gap width on the columns has been set to zero. Relative Frequency Relative frequency is one way to determine a probability. Divide each frequency by the sum (the sample size) to get the relative frequency Height CUL Frequency Relative Frequency f/n or P(x) 59.6 4 0. 61.2 8 0. 62.8 13 0. 64.4 8 0. 66 4 0. Sum: 37 1. The relative frequency always adds to one (rounding causes the above to add to 1.01, if all the decimal places were used the relative frequencies would add to one.
Source URL: http://www.comfsm.fm/~dleeling/statistics/text.html#page 021 Saylor URL: http://saylor.org/courses/bus Attributed to: [Dana Lee Ling] Saylor.org The above method produces equal width classes and to conforms the inclusion of the class upper limit by spreadsheet packages. Checking frequency tables The final class upper limit must be equal to the maximum value in the data set. The frequencies must sum to the sample size n. The relative frequencies must add to 1.00. CUL Frequency Relative Frequency f/n min + width
Frequency function For more advanced spreadsheet users, frequency data can be obtained using the frequency function FREQUENCY. This function is also very useful when working with large data sets. The frequency function is: =FREQUENCY(DATA,CLASSES) DATA refers to the range of cells containing the data, CLASSES refers to the range of cells containing the class upper limits. The data set seen below are the height measurements for 49 female students in statistics courses during two consecutive terms. The frequency function built into spreadsheets works very differently from all other functions. The frequency function called an "array" function because the function places values into an array of cells. For the function to do this, you must first select the cells into which the function will place the frequency values.
Source URL: http://www.comfsm.fm/~dleeling/statistics/text.html#page 021 Saylor URL: http://saylor.org/courses/bus Attributed to: [Dana Lee Ling] Saylor.org With the cells still highlighted, start typing the frequency function. After typing the opening parenthesis, drag and select the data to be classed. If the data is more than can be selected by dragging, type the data range in by hand.
Source URL: http://www.comfsm.fm/~dleeling/statistics/text.html#page 021 Saylor URL: http://saylor.org/courses/bus Attributed to: [Dana Lee Ling] Saylor.org Then press and hold down BOTH the CONTROL (Ctrl) key and the SHIFT key. With both the control and shift keys held down, press the Enter (or Return) key. As noted above, the frequencies should add to the sample size. When working with spreadsheets, internal rounding errors can cause the maximum value in a data set to not get included in the final class. In the last class, use the value obtained by the MAX function and not the previous class + a width formula to generate that class upper limit. 2.3 Shapes of Distributions
Source URL: http://www.comfsm.fm/~dleeling/statistics/text.html#page 021 Saylor URL: http://saylor.org/courses/bus Attributed to: [Dana Lee Ling] Saylor.org The shapes of distributions have names by which they are known. (See below) One of the aspects of a sample that is often similar to the population is the shape of the distribution. If a good random sample of sufficient size has a symmetric distribution, then the population is likely to have a symmetric distribution. The process of projecting results from a sample to a population is called generalizing. Thus we can say that the shape of a sample distribution generalizes to a population.
Source URL: http://www.comfsm.fm/~dleeling/statistics/text.html#page 021 Saylor URL: http://saylor.org/courses/bus Attributed to: [Dana Lee Ling] Saylor.org At the first dialog box be sure to click on the "First column as label" check box as indicated by the arrow in the diagram below.
Source URL: http://www.comfsm.fm/~dleeling/statistics/text.html#page 021 Saylor URL: http://saylor.org/courses/bus Attributed to: [Dana Lee Ling] Saylor.org For the next two screens simply click on "Next" On the next screen fill in the appropriate titles. The legend can be "unchecked" as seen below.