Download SUMMARIZING DATA: NUMERICAL MEASURES and more Lecture notes Mathematical Statistics in PDF only on Docsity!
SUMMARIZING DATA: NUMERICAL MEASURES
WEEK 3
- MEASURES OF CENTRAL TENDENCY-
SUMMARIZING DATA Organizing data ü a) Tables ü b) Diagrams c) Numerical Measures
THE Σ ( SIGMA) SIGN Dr. Doğukan ÖZEN 32 The sign Σ ( sigma) is a summation sign. We can write (x 1 +x 2 +x 3 +… xn) as --> If a and b are integers and a < b, then; means à xa+xa+1+xa+2+… xb Question : If x 1 = 3, x 2 = 6 and x 3 =^ - 5 then find the following? a) (^) b) c)
- Most widely used measure of central tendency !!
- Arithmetic mean is the sum of all observations divided by the number of observations.
- In statistical terms, it can be written as à ARITHMETIC MEAN
MEDIAN
- An alternative measure to mean. (More precisely: sample median!)
- Suppose there are n observations in a sample. If these observations are
ordered from smallest to largest, then the median can be defined as;
% &
th largest observation if n is odd
The average of th^ and^ th^ largest observations if n is even.
EXAMPLE:
The following table consist of somatic cell count measurement (x10000) of milk samples taken
from 10 Holstein in a dairy farm. Compute the median value of somatic cell count.
i xi i xi 1 11 6 8 2 21 7 9 3 18 8 110 4 14 9 12 5 13 10 20
Solution:
Step 1. Order the sample from smallest to largest.
Step 2. Because n is even (n=10), sample median is the
average of 5
th
and 6
th
observations.
Step 3. Compute the sample median = (13+14)/2= 13,
THE GEOMETRIC MEAN
- Some of the laboratory data can be expressed either as multiples of 2 or as a constant multiplied by a power of 2.
- So the outcomes can be in a form of 2 k c, where k=0,1,2,3,… (with a constant c) A possible solution can be by using log-transformed observations and then taking the arithmetic mean of the observations: Geometric mean =
EXAMPLE
- Compute the geometric mean of 3, 5, 6, 6, 7, 12 and 20. Arithmetic mean = 8. Median = 6 Mode = 6
Median
- widely used for markedly skewed distributions because it is sensitive only to the number rather than to the values of scores above and below it
- the most stable measure that can be used with open-ended distributions
- more subject to sampling fluctuation than the mean
Mode
- more appropriate than the mean or the median for quantitative variables that are inherently discrete
- the only measure appropriate for unordered qualitative variables
- much more subject to sampling fluctuation than the mean and the median A SUMMARY OF MEAN, MEDIAN AND MODE
LOCATION OF MEAN, MEDIAN AND MODE IN A DISTRIBUTION To skew means to stretch in one direction. A distribution is skewed to the left if the left tail is longer than the right tail. A distribution is skewed to the right if the right tail is longer than the left tail. A left-skewed distribution stretches to the left, a right-skewed to the right. 42 The Normal Curve: represents the symmetrical distribution Negative direction Positive direction Negatively skewed No skew Positively skewed Mode Median Mean Mode Median Mean Mean Median Mode Frequency
SUMMARIZING DATA: NUMERICAL MEASURES
- Let’s say two different group of investigators collected the following data:
- Group 1: 30, 120, 130, 80, 90
- Group 2: 88, 92, 90, 86, 94
Mean = 90
To describe the data, we also need to give information about the dispersion!
MEASURES OF DISPERSION
- Defined as the distance between the largest and the smallest observations = (Max-Min) RANGE (^) ( R )
Range;
can overestimate the dispersion due to extreme values.
Tends to increase in value as the number of observations in the sample increases.
INTERQUARTILE RANGE
- It is the difference between the first (25% - > Q1) and third quartiles (75% - > Q3).
The interquartile range;
Ø Is not influenced by extreme values or sample size
Ø Can be misleading due to ignoring most of the observations (it is calculated from
only two of points)