Download Descriptive Statistics - Basic Statistics for Sociology - Lecture Slides and more Slides Statistics for Psychologists in PDF only on Docsity!
Descriptive Statistics
Healey Chapters 3 and 4
(2nd^ Cdn Ch. 3)
Measures of Central Tendency And Dispersion
Measures of Central Tendency
- 1. Mode = can be used for any kind of data but only measure of central tendency for nominal or qualitative data.
- Formula: value that occurs most often or the category or interval with highest frequency.
- Note: Omit Formula 3.1 Variation Ratio in Healey and Prus 2 nd^ Cdn.
Central Tendency (cont.)
- 2. Median = exact centre or middle of ordered data. The 50th percentile.
- Formula:
- Array data.
- When sample even #, median falls halfway between two middle numbers.
- To calculate: find(n/2)and (n/2)+1, and divide the total by 2 to find the exact median.
- When sample is odd #, median is exact middle (n+1) /2)
Example for Raw Data:
- Suppose you have the following set of test scores:
- 66, 89, 41, 98, 76, 77, 69, 60, 60, 66, 69, 66, 98, 52, 74, 66, 89, 95, 66, 69
- Array data:
- 98 98 95 89 89 77 76 74 69 69 69 66 66 66 66 66 60 60 52 41 N = 20 (N is even)
Median for Aggregate (grouped) Data
- This formula is shown in Healey 1 st^ Cdn Edition and in Healey 8e but NOT in 2 nd^ Cdn
- We will NOT COVER this one!
Properties of median :
- for numerical data at interval or ordinal level
- -"balance point“
- -not affected by outliers
- -median is appropriate when distribution is highly skewed.
Example for Mean
= 1446 / 20 = 72.
The mean for these test scores is 72.
Mean for Aggregate (Grouped) Data
(Note: 1st^ Cdn. Edition: use this formula!
Omitted in 2 nd^ Cdn. Ed. but covered in class)
- To calculate the mean for grouped data, you need a frequency table that includes a column for the midpoints, for the product of the frequencies times the midpoints (fm).
Formula: = Σ (fm) N
Calculating Mean for Grouped
Data:
Formula: = Σ (fm)
N = 1420 / 20 = 71
The mean for the grouped data is 71.
Properties of the Mean:
**- only for numerical data at interval level
- "balance point“
- can be affected by outliers = skewed distribution
- tail becomes elongated and the mean is pulled in direction of outlier.**
Example… no outlier: $30000, 30000, 35000, 25000, 30000 then mean = $ but if outlier is present, then: $130000, 30000, 35000, 25000, 30000 then mean = $ (the mean is pulled up or down in the direction of the outlier)
Measures of Dispersion
- Describe how variable the data are.
- i.e. how spread out around the mean
- Also called measures of variation or variability
Variability for Non-numerical Data
(Nominal or Ordinal Level Data)
- Measures of variability for non-numerical nominal or ordinal) data are rarely used
- We will not be covering these in class
- Omit Formula 4.1 IQV in Healey and Prus 1st Canadian Edition and in Healey 8e
- Omit Formula 3.1 Variation Ratio in Healey and Prus 2nd^ Canadian Edition
Interquartile Range (Q):
- This is the difference between the 75th and the 25th percentiles (the middle 50%)
- Gives better idea than range of what the middle of the distribution looks like.
Formula: Q = Q 3 - Q 1 (where Q 3 = N x .75, and Q 1 = N x .25) Using above data: Q = Q 3 - Q 1 = (6th^ – 2nd^ case) = $30000-25000 =$ The interquartile range (Q) is $5000.
3. Variance and Standard Deviation:
- For raw data at the interval/ratio level.
- Most common measure of variation.
- The numerator in the formula is known as the sum of squares , and the denominator is either the population size N or the sample size n-
- The variance is denoted by S 2 and the standard deviation, which is the square root of the variance, by S