






Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
An overview of data classification, focusing on nominal, ordinal, interval, and ratio data. It also discusses qualitative and quantitative data, as well as crossectional and time series data. The presentation of data through frequency distribution tables, absolute and relative frequency histograms, stem and leaf diagrams, and scatter diagrams.
Typology: Study notes
1 / 12
This page cannot be seen from the preview
Don't miss anything!







BIT 2405 Week 2
Supposed we try using 6 classes then the width of each interval would be something like: Width (12-2.4)/6 = 1.6 => Round up. Class width of 2 One form for our frequency table would then be: Tuition Rates (in $000) Number of Schools 2.0 but less than 4.0 13 4.0 but less than 6.0 24 6.0 but less than 8.0 9 8.0 but less than 10.0 8 10.0 but less than 12.0 5 12.0 but less than 14.0 1 A variation of the above absolute frequency table is to display the relative frequency of observations that fall in the specified intervals rather than absolute frequencies. A relative frequency table has 3 or 4 columns. Its components are described below. Category Frequency Relative Frequency Percent (optional) For our data set one possible form for a relative frequency table is as follows: Tuition Rates (in $000) Proportion of Schools 2.0 but less than 4.0 0. 4.0 but less than 6.0 0. 6.0 but less than 8.0 0. 8.0 but less than 10.0 0. 10.0 but less than 12.0 0. 12.0 but less than 14.0 0. Yet another variation is to display the cumulative frequency distribution; i.e. – display the number of observations that are less than the upper boundary of each class interval. For example, the Data Analysis routine in Excel provides us with the following output: Upper Limit Frequency Cumulative % 2 13 21.67% 4 24 61.67% 6 9 76.67% 8 8 90.00% 10 5 98.33% 12 1 100.00% Histograms A Histogram is simply a graphical display of a frequency distribution (table). There are a number of different forms of histograms. We will consider three types of histograms: 1.Absolute frequency histograms
Similar to constructing a frequency table, we have three major considerations: 1.# of intervals
Mathematically, we could denote the sum as A more convenient way of doing this would be to use the shorthand If we had n observations, we can generalize this to: For our example, An additional example: Note that: Page 16 of 25 Measures of Location or Central Tendency What we seek is a number that we feel is typical or representative of the data set. We will consider four such measures:
The most commonly used measure of central tendency. We denote the mean for a population and a sample differently but compute them in the same manner For a population: For a sample: Example: A large department store collects data on sales made by each of its salespeople. The data, number of sales made on a given day by each of 20 salespeople are as follows: 9, 6, 12, 10, 13, 15, 16, 14, 14, 16, 17, 16, 24, 21, 22, 18, 19, 18, 20, 17 The sample mean is Σ
Page 17 of 25 Properties of the mean:
0 2 3 5 10
Definition: If the number of observations is odd Page 18 of 25 If the number of observations is even Previous example revisited: Sorting the 20 observations in ascending order we have: 6, 9, 10, 12, 13, 14, 14, 15, 16, 16, 16, 17, 17, 18, 18, 19, 20, 21, 22, 24 Since the number of observations is even, the median is the average of the 10th and 11th largest observations NOTE: The median is resistant to extreme values.
Definition: Working with the same sample data: 6, 9, 10, 12, 13, 14, 14, 15, 16, 16, 16, 17, 17, 18, 18, 19, 20, 21, 22, 24 Page 19 of 25
Example: Car mileage; n = 29; leaf unit = 0. 29.8, 30.1, 30.4, 30.4, 30.5, 30.6, 30.8, 30.8, 31.2, 31.3, 31.3, 31.4, 31.4, 31.5, 31.5, 31.7, 31.7, 31.7, 31.8, 31.9, 32.0, 32.2, 32.2, 32.4, 32.4, 32.5, 32.5, 32.8, 33. Freq Stem Leaf 1 29 8 3 30* 144 4 30 5688 Median = 5 31* 23344 (7) 31 5577789 5 32* 02244 Mode = 3 32 558 1 33* 3 Note that the stem labels used provide a more detailed display. For example, row 30* contains the mileages from 30.0 to 30.4, while row 30 contains the mileages from 30.5 to 30.9.
(a) If skewed right then (b) If skewed left then (c) If symmetrical then Page 20 of 25
Definition of an Outlier:
Page 22 of 25
Definition: Example: Consider the sales data (number of sales for particular salesperson on 20 different days) displayed as an ordered array. 6, 9, 10, 12, 13, 14, 14, 15, 16, 16, 16, 17, 17, 18, 18, 19, 20, 21, 22, 24 Range = Note: The range has the disadvantage of only considering 2 of the N population or n sample observations. A logical alternative would be to base our measure of variability as a function of the distance all observations are from a typical value like the arithmetic mean. Average (or Mean) Absolute Deviation (AVEDEV) Definition: Excel uses the term AVEDEV Mathematical definition: AVEDEV = Example: Sales data set 6, 9, 10, 12, 13, 14, 14, 15, 16, 16, 16, 17, 17, 18, 18, 19, 20, 21, 22, 24 Recall that the Mean = 15. AVEDEV =
Page 23 of 25
Definition (important!): Formulas: The mathematical definition and computational formulas for the population and sample variance are Population Variance Sample Variance
Note: The unit of measurement corresponding to the variance for both the sample and population is the square of the original unit of measurement. To get back to the original unit, we take the square root of the variance. The resulting number is termed the standard deviation.
Definition (important!): Page 24 of 25 Formulas: Mathematical definitions of the standard deviation:
Example: Consider the following data (number of sales for a particular salesperson on 5 different days) displayed as an ordered array. 6, 9, 10, 12, 13 The calculation of the sample variance and standard deviation is illustrated below using the definition and computational formulas for the two statistics. Calculations:
Mean =