Data Classification and Presentation, Study notes of Introduction to Business Management

An overview of data classification, focusing on nominal, ordinal, interval, and ratio data. It also discusses qualitative and quantitative data, as well as crossectional and time series data. The presentation of data through frequency distribution tables, absolute and relative frequency histograms, stem and leaf diagrams, and scatter diagrams.

Typology: Study notes

Pre 2010

Uploaded on 12/10/2008

vanilla-bear
vanilla-bear 🇺🇸

1 document

1 / 12

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
BIT 2405 Week 2
1. Classification of Data (Variables)
a. Nominal/Ordinal/Interval/Ratio
Nominal is an observational study of data in groups (gender, true/false)
Ordinal allows classifying, ranking, or ordering data (plus Nominal)
Interval data allows us to make statements about characteristics of data (SAT
scores/temperature) (plus Nominal and Ordinal)
Ratio has a meaningful zero and compares amounts of data
(weights/heights/profits/time)
b. Qualitative/Quantitative
Qualitative is categorical data (nominal and ordinal) and cannot be measured on
a numerical scale
Quantitative can be recorded on a numerical scale (interval and ratio)
c. Crossectional Data is collected at the same point in time
d. Time Series Data is collected over several time periods
Quantitative data can be further classified as continuous or discrete.
Continuous data
Discrete data
Summary
Exercises:
1. A supervisor must give a summary evaluation rating from among the following choices: 1) Poor
2) Fair 3) Good 4) Very Good 5) Excellent
Are these data qualitative or quantitative? Qualitative Quantitative
Are these data discrete or continuous? Discrete Continuous Neither
What is the highest level of measurement the data possess?
Nominal Ordinal Interval Ratio
2. A company is evaluating customer satisfaction with one of their products. A survey of 400
persons is conducted. Each person is asked: “What is your level of satisfaction with the
company’s products?” 1) Poor 2) Average 3) Good 4) Excellent
Are these data qualitative or quantitative? Qualitative Quantitative
Are these data discrete or continuous? Discrete Continuous Neither
What is the highest level of measurement the data possess?
Nominal Ordinal Interval Ratio
3. The weight of 50 newborn babies at a local hospital.
Are these data qualitative or quantitative? Qualitative Quantitative
Are these data discrete or continuous? Discrete Continuous Neither
What is the highest level of measurement the data possess?
Nominal Ordinal Interval Ratio
pf3
pf4
pf5
pf8
pf9
pfa

Partial preview of the text

Download Data Classification and Presentation and more Study notes Introduction to Business Management in PDF only on Docsity!

BIT 2405 Week 2

  1. Classification of Data (Variables) a. Nominal/Ordinal/Interval/Ratio Nominal is an observational study of data in groups (gender, true/false) Ordinal allows classifying, ranking, or ordering data (plus Nominal) Interval data allows us to make statements about characteristics of data (SAT scores/temperature) (plus Nominal and Ordinal) Ratio has a meaningful zero and compares amounts of data (weights/heights/profits/time) b. Qualitative/Quantitative Qualitative is categorical data (nominal and ordinal) and cannot be measured on a numerical scale Quantitative can be recorded on a numerical scale (interval and ratio) c. Crossectional Data is collected at the same point in time d. Time Series Data is collected over several time periods Quantitative data can be further classified as continuous or discrete. Continuous data Discrete data Summary

Exercises :

  1. A supervisor must give a summary evaluation rating from among the following choices: 1) Poor
  1. Fair 3) Good 4) Very Good 5) Excellent Are these data qualitative or quantitative? Qualitative Quantitative Are these data discrete or continuous? Discrete Continuous Neither What is the highest level of measurement the data possess? Nominal Ordinal Interval Ratio
  1. A company is evaluating customer satisfaction with one of their products. A survey of 400 persons is conducted. Each person is asked: “What is your level of satisfaction with the company’s products?” 1) Poor 2) Average 3) Good 4) Excellent Are these data qualitative or quantitative? Qualitative Quantitative Are these data discrete or continuous? Discrete Continuous Neither What is the highest level of measurement the data possess? Nominal Ordinal Interval Ratio
  2. The weight of 50 newborn babies at a local hospital. Are these data qualitative or quantitative? Qualitative Quantitative Are these data discrete or continuous? Discrete Continuous Neither What is the highest level of measurement the data possess? Nominal Ordinal Interval Ratio
  1. You want to order a pizza. There are four kinds of pizza: 1) Pepperoni 2) Mushroom 3) Black Olive 4) Sausage Are these data qualitative or quantitative? Qualitative Quantitative Are these data discrete or continuous? Discrete Continuous Neither What is the highest level of measurement the data possess? Nominal Ordinal Interval Ratio
  2. You toss a coin and record “head” as 0 and “tail” as 1. Are these data qualitative or quantitative? Qualitative Quantitative Are these data discrete or continuous? Discrete Continuous Neither What is the highest level of measurement the data possess? Nominal Ordinal Interval Ratio c. Crossectional Data is collected at the same point in time d. Time Series Data is collected over several time periods Quantitative data cant take on an integer value
  3. Presentation of Data: a. Graphical Presentation of Quantitative Information i. Frequency Distribution Tables ii. Histograms
  4. Absolute Frequency Histogram
  5. Relative Frequency Histogram
  6. Cumulative Frequency Histogram iii. Stem and Leaf Diagrams‐and‐Leaf Diagrams ‐and‐Leaf Diagrams iv. Crosstabulations v. Scatter Diagrams b. Graphical Presentation of Qualitative Information i. Frequency Distribution Tables ii. Bar Charts iii. Pie Charts
  7. Brief review of Summation Notation
  8. Numerical Measures of Location a. Arithmetic Mean b. Median c. Mode d. Weighted Average
  9. Numerical Measures of Dispersion or Variability a. Range b. Mean (Average) Absolute Deviation c. Variance d. Standard Deviation Classification of Data What we can do with a data set (e.g., summarize, present, make inferences) depends on the type of

Supposed we try using 6 classes then the width of each interval would be something like: Width (12-2.4)/6 = 1.6 => Round up. Class width of 2 One form for our frequency table would then be: Tuition Rates (in $000) Number of Schools 2.0 but less than 4.0 13 4.0 but less than 6.0 24 6.0 but less than 8.0 9 8.0 but less than 10.0 8 10.0 but less than 12.0 5 12.0 but less than 14.0 1 A variation of the above absolute frequency table is to display the relative frequency of observations that fall in the specified intervals rather than absolute frequencies. A relative frequency table has 3 or 4 columns. Its components are described below. Category Frequency Relative Frequency Percent (optional) For our data set one possible form for a relative frequency table is as follows: Tuition Rates (in $000) Proportion of Schools 2.0 but less than 4.0 0. 4.0 but less than 6.0 0. 6.0 but less than 8.0 0. 8.0 but less than 10.0 0. 10.0 but less than 12.0 0. 12.0 but less than 14.0 0. Yet another variation is to display the cumulative frequency distribution; i.e. – display the number of observations that are less than the upper boundary of each class interval. For example, the Data Analysis routine in Excel provides us with the following output: Upper Limit Frequency Cumulative % 2 13 21.67% 4 24 61.67% 6 9 76.67% 8 8 90.00% 10 5 98.33% 12 1 100.00% Histograms A Histogram is simply a graphical display of a frequency distribution (table). There are a number of different forms of histograms. We will consider three types of histograms: 1.Absolute frequency histograms

  1. Relative frequency histograms
  2. Cumulative frequency histograms

Similar to constructing a frequency table, we have three major considerations: 1.# of intervals

  1. interval width 3.check exhaustive and mutually exclusive Absolute frequency histogram : A graphical display of the information found in an absolute frequency table. Note: When we examine a frequency distribution (either in tabular or graphical form) we are very much interested in two things:
  2. Spread / dispursion Page 11 of 25 StemandLeaf Diagrams A stem and leaf diagram is a tool similar to a histogram. Like a histogram, they reflect frequencies,‐and‐Leaf Diagrams ‐and‐Leaf Diagrams concentrations of data, and shapes. Advantages: Given a set of numbers, generally the first digit or two will be the ‘stem,’ the rest will be the ‘leaf.’ Example: Consider the tuition data for the 60 Texas Schools presented in the ordered array. The resulting stem and leaf diagram is presented below:‐and‐Leaf Diagrams ‐and‐Leaf Diagrams Stem and leaf of Tuition‐and‐Leaf Diagrams ‐and‐Leaf Diagrams N = 60; Leaf Unit = 0. 1 2 4 12 3 455666899999 19 4 1456778888999999999 5 5 04889 4 6 0446 5 7 02479 8 8 00033568 0 9 3 10 347 2 11 06 1 12 0 BE CAREFUL! Too few or too many stems give little descriptive information about the distribution of numbers. 2.4, 3.4, 3.5, 3.5, 3.6, 3.6, 3.6, 3.8, 3.9, 3.9, 3.9, 3.9, 3.9, 4.4, 4.5, 4.6, 4.7, 4.7, 4.8, 4.8, 4.8, 4.8, 4.9, 4.9, 4.9, 4.9, 4.9, 4.9, 4.9, 5.0, 5.4, 5.8, 5.8, 5.9, 6.0, 6.4, 6.4, 6.6, 7.2, 7.4, 7.7, 7.9, 8.0, 8.0, 8.0, 8.3, 8.3, 8.5, 8.6, 8.8, 10.3, 10.7, 11.0, 11.6, 12. Page 12 of 25 Crosstabulations Restaurant Quality Price 1 Good 18 2 Very Good 22 3 Good 28 4 Excellent 38 5 Very Good 33

X 1 = 5, X 2 = 8, X 3 = 14

Mathematically, we could denote the sum as A more convenient way of doing this would be to use the shorthand If we had n observations, we can generalize this to: For our example, An additional example: Note that: Page 16 of 25 Measures of Location or Central Tendency What we seek is a number that we feel is typical or representative of the data set. We will consider four such measures:

  1. Arithmetic Mean
  2. Median
  3. Mode
  4. Weighted Average

Arithmetic Mean

The most commonly used measure of central tendency. We denote the mean for a population and a sample differently but compute them in the same manner For a population: For a sample: Example: A large department store collects data on sales made by each of its salespeople. The data, number of sales made on a given day by each of 20 salespeople are as follows: 9, 6, 12, 10, 13, 15, 16, 14, 14, 16, 17, 16, 24, 21, 22, 18, 19, 18, 20, 17 The sample mean is 􀵌 Σ 􀵌 􀵌

􀵌 Page 17 of 25 Properties of the mean:

  1. The mean is sensitive to ALL of data. In other words, if one score in the distribution is changed, the mean will change too. Example: Obs. 1, 2, 3 1, 2, 30 1, 2, 300 As shown above, the mean is affected (runs to) extreme values. This can be a drawback and there is therefore a need to consider other measures of central tendency.
  2. The sum of the deviations about the mean equals zero. Σ 􀵌 􀵌􀵌􀵌 􀵌􀵌 􀵌 0 or Σ 􀵌 􀵌 􀵌

􀵌􀵌􀵌 􀵌 􀵌 􀵌 􀵌 0 􀵌􀵌 􀵌􀵌 2 3 5 10 􀵌

Median

Definition: If the number of observations is odd Page 18 of 25 If the number of observations is even Previous example revisited: Sorting the 20 observations in ascending order we have: 6, 9, 10, 12, 13, 14, 14, 15, 16, 16, 16, 17, 17, 18, 18, 19, 20, 21, 22, 24 Since the number of observations is even, the median is the average of the 10th and 11th largest observations NOTE: The median is resistant to extreme values.

Mode

Definition: Working with the same sample data: 6, 9, 10, 12, 13, 14, 14, 15, 16, 16, 16, 17, 17, 18, 18, 19, 20, 21, 22, 24 Page 19 of 25

Median and Mode in Stem and Leaf Diagrams‐and‐Leaf Diagrams ‐and‐Leaf Diagrams

Example: Car mileage; n = 29; leaf unit = 0. 29.8, 30.1, 30.4, 30.4, 30.5, 30.6, 30.8, 30.8, 31.2, 31.3, 31.3, 31.4, 31.4, 31.5, 31.5, 31.7, 31.7, 31.7, 31.8, 31.9, 32.0, 32.2, 32.2, 32.4, 32.4, 32.5, 32.5, 32.8, 33. Freq Stem Leaf 1 29 8 3 30* 144 4 30 5688 Median = 5 31* 23344 (7) 31 5577789 5 32* 02244 Mode = 3 32 558 1 33* 3 Note that the stem labels used provide a more detailed display. For example, row 30* contains the mileages from 30.0 to 30.4, while row 30 contains the mileages from 30.5 to 30.9.

Relationship among the mean, mode, and median

(a) If skewed right then (b) If skewed left then (c) If symmetrical then Page 20 of 25

Effect of Outliers

Definition of an Outlier:

Section B

Page 22 of 25

Range

Definition: Example: Consider the sales data (number of sales for particular salesperson on 20 different days) displayed as an ordered array. 6, 9, 10, 12, 13, 14, 14, 15, 16, 16, 16, 17, 17, 18, 18, 19, 20, 21, 22, 24 Range = Note: The range has the disadvantage of only considering 2 of the N population or n sample observations. A logical alternative would be to base our measure of variability as a function of the distance all observations are from a typical value like the arithmetic mean. Average (or Mean) Absolute Deviation (AVEDEV) Definition: Excel uses the term AVEDEV Mathematical definition: AVEDEV = Example: Sales data set 6, 9, 10, 12, 13, 14, 14, 15, 16, 16, 16, 17, 17, 18, 18, 19, 20, 21, 22, 24 Recall that the Mean = 15. AVEDEV =

􀵌􀵌􀵌

Page 23 of 25

Variance (Mean Square)

Definition (important!): Formulas: The mathematical definition and computational formulas for the population and sample variance are Population Variance Sample Variance

Note: The unit of measurement corresponding to the variance for both the sample and population is the square of the original unit of measurement. To get back to the original unit, we take the square root of the variance. The resulting number is termed the standard deviation.

Standard Deviation

Definition (important!): Page 24 of 25 Formulas: Mathematical definitions of the standard deviation:

Population: √√√√√√√√√√√√√√􀵌 Σ 􀵌􀵌􀵌 􀵌 􀵌

Sample: √√√√√√√√√√√√√√􀵌 Σ 􀵌􀵌􀵌

Example: Consider the following data (number of sales for a particular salesperson on 5 different days) displayed as an ordered array. 6, 9, 10, 12, 13 The calculation of the sample variance and standard deviation is illustrated below using the definition and computational formulas for the two statistics. Calculations:

􀵌

Mean = 􀵌􀵌

Definition: 􀵌􀵌 􀵌 Σ􀵌􀵌 􀵌 􀵌 􀵌