SUMMARIZING DATA: NUMERICAL MEASURES, Lecture notes of Mathematical Statistics

The geometric mean is usually equal to the mean if data are right-skewed. • So it is preferable to use geometric mean rather than median for right skewed ...

Typology: Lecture notes

2022/2023

Uploaded on 02/28/2023

astarloa
astarloa 🇺🇸

4.2

(12)

298 documents

1 / 26

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a

Partial preview of the text

Download SUMMARIZING DATA: NUMERICAL MEASURES and more Lecture notes Mathematical Statistics in PDF only on Docsity!

SUMMARIZING DATA: NUMERICAL MEASURES

WEEK 3

  • MEASURES OF CENTRAL TENDENCY-

SUMMARIZING DATA Organizing data ü a) Tables ü b) Diagrams c) Numerical Measures

THE Σ ( SIGMA) SIGN Dr. Doğukan ÖZEN 32 The sign Σ ( sigma) is a summation sign. We can write (x 1 +x 2 +x 3 +… xn) as --> If a and b are integers and a < b, then; means à xa+xa+1+xa+2+… xb Question : If x 1 = 3, x 2 = 6 and x 3 =^ - 5 then find the following? a) (^) b) c)

  • Most widely used measure of central tendency !!
  • Arithmetic mean is the sum of all observations divided by the number of observations.
  • In statistical terms, it can be written as à ARITHMETIC MEAN

MEDIAN

  • An alternative measure to mean. (More precisely: sample median!)
  • Suppose there are n observations in a sample. If these observations are

ordered from smallest to largest, then the median can be defined as;

% &

th largest observation if n is odd

The average of th^ and^ th^ largest observations if n is even.

EXAMPLE:

The following table consist of somatic cell count measurement (x10000) of milk samples taken

from 10 Holstein in a dairy farm. Compute the median value of somatic cell count.

i xi i xi 1 11 6 8 2 21 7 9 3 18 8 110 4 14 9 12 5 13 10 20

Solution:

Step 1. Order the sample from smallest to largest.

Step 2. Because n is even (n=10), sample median is the

average of 5

th

and 6

th

observations.

Step 3. Compute the sample median = (13+14)/2= 13,

THE GEOMETRIC MEAN

  • Some of the laboratory data can be expressed either as multiples of 2 or as a constant multiplied by a power of 2.
  • So the outcomes can be in a form of 2 k c, where k=0,1,2,3,… (with a constant c) A possible solution can be by using log-transformed observations and then taking the arithmetic mean of the observations: Geometric mean =

EXAMPLE

  • Compute the geometric mean of 3, 5, 6, 6, 7, 12 and 20. Arithmetic mean = 8. Median = 6 Mode = 6

Median

  • widely used for markedly skewed distributions because it is sensitive only to the number rather than to the values of scores above and below it
  • the most stable measure that can be used with open-ended distributions
  • more subject to sampling fluctuation than the mean

Mode

  • more appropriate than the mean or the median for quantitative variables that are inherently discrete
  • the only measure appropriate for unordered qualitative variables
  • much more subject to sampling fluctuation than the mean and the median A SUMMARY OF MEAN, MEDIAN AND MODE

LOCATION OF MEAN, MEDIAN AND MODE IN A DISTRIBUTION To skew means to stretch in one direction. A distribution is skewed to the left if the left tail is longer than the right tail. A distribution is skewed to the right if the right tail is longer than the left tail. A left-skewed distribution stretches to the left, a right-skewed to the right. 42 The Normal Curve: represents the symmetrical distribution Negative direction Positive direction Negatively skewed No skew Positively skewed Mode Median Mean Mode Median Mean Mean Median Mode Frequency

  • MEASURES OF DISPERSION-

SUMMARIZING DATA: NUMERICAL MEASURES

  • Let’s say two different group of investigators collected the following data:
    • Group 1: 30, 120, 130, 80, 90
    • Group 2: 88, 92, 90, 86, 94

Mean = 90

To describe the data, we also need to give information about the dispersion!

MEASURES OF DISPERSION

  • Defined as the distance between the largest and the smallest observations = (Max-Min) RANGE (^) ( R )

Range;

can overestimate the dispersion due to extreme values.

Tends to increase in value as the number of observations in the sample increases.

INTERQUARTILE RANGE

  • It is the difference between the first (25% - > Q1) and third quartiles (75% - > Q3).

The interquartile range;

Ø Is not influenced by extreme values or sample size

Ø Can be misleading due to ignoring most of the observations (it is calculated from

only two of points)