Measuring Center & Dispersion: Mean, Median, Trimmed Mean, M Estimates, Std Deviation, Int, Exams of Mathematical Statistics

A lecture note from Math 408 - Mathematical Statistics at the University of Southern California (USC), covering the topics of measures of location, including the arithmetic mean, median, trimmed mean, and M estimates, as well as measures of dispersion, such as the sample standard deviation, interquartile range, and median absolute deviation. The lecture also discusses the importance of these measures in summarizing data and their robustness to outliers.

Typology: Exams

2021/2022

Uploaded on 09/27/2022

rajeshi
rajeshi 🇺🇸

4.1

(9)

237 documents

1 / 12

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Math 408 - Mathematical Statistics
Lecture 36. Summarizing Data - III
April 29, 2013
Konstantin Zuev (USC) Math 408, Lecture 36 April 29, 2013 1 / 12
pf3
pf4
pf5
pf8
pf9
pfa

Partial preview of the text

Download Measuring Center & Dispersion: Mean, Median, Trimmed Mean, M Estimates, Std Deviation, Int and more Exams Mathematical Statistics in PDF only on Docsity!

Math 408 - Mathematical Statistics

Lecture 36. Summarizing Data - III

April 29, 2013

Agenda

Measures of Location I (^) Arithmetic Mean I (^) Median I (^) Trimmed Mean I (^) M Estimates Measures of Dispersion I (^) Sample Standard Deviation I (^) Interquartile Range (IQR) I (^) Median Absolute Deviation (MAD) Boxplots Summary

The Arithmetic Mean

The most commonly used measure of location is the arithmetic mean,

x =

n

∑^ n

i=

xi

A common statistical model for the variability of a measurement process is the following: xi = μ + εi

xi is the value of the ith^ measurement μ is the true value of the quantity εi is the random error, εi ∼ N (0, σ^2 ) The arithmetic mean is then:

x = μ +

n

∑^ n

i=

εi ,

n

∑^ n

i=

εi ∼ N (0,

σ^2 n

The Median

The main drawback of the arithmetic mean is it is sensitive to outliers. If fact, by changing a single number, the arithmetic mean of a batch of numbers can be made arbitrary large or small. For this reason, measures of location that are robust, or insensitive to outliers, are important.

Definition

If the batch size is an odd number, x 1 ,... , x 2 n− 1 , then the median ˜x is defined to be the middle value of the ordered batch values:

x 1 ,... , x 2 n− 1 x(1) <... < x(2n−1), ˜x = x(n)

Important Remark: Moving the extreme observations does not affect the sample median at all, so the median is quite robust.

M Estimates

Let x 1 ,... , xn be a batch of numbers. It is easy to show that The mean x = arg min y ∈R

∑^ n

i=

(xi − y )^2

Outliers have a great effect on mean, since the deviation of y from xi is measured by the square of their difference. The median ˜x = arg min y ∈R

∑^ n

i=

|xi − y |

Here, large deviations are not weighted as heavily, that is exactly why the median is robust. In general, consider the following function:

f (y ) =

∑^ n

i=

Ψ(xi , y ),

where Ψ is called the weight function. M estimate is the minimizer of f :

y ∗^ = arg min y ∈R

∑^ n

i=

Ψ(xi , y )

Measures of Dispersion

A measure of dispersion, or scale, gives a numerical characteristic of the “scatteredness” of a batch of numbers. The most commonly used measure is the sample standard deviation s, which is the square root of the sample variance,

s =

n − 1

∑^ n

i=

(xi − x)^2

Q: Why (^) n−^11 instead of (^1) n?

A: s^2 is an unbiased estimate of the population variance σ^2. If n is large, then it makes little difference whether (^) n−^11 or (^1) n is used.

Like the mean, the standard deviation s is sensitive to outliers.

Example

Let the ordered batch be {xi } = { 1 , 2 , 5 , 6 , 9 , 11 , 19 } Q 2 = ˜x = 6 Q 1 = 2 Q 3 = 11 IQR = 9 {|xi − x˜|} = { 5 , 4 , 1 , 0 , 3 , 5 , 13 }

MAD = 4

Boxplots

A boxplot is a graphical display of numerical data that is based on five-number summaries: the smallest observation, lower quartile (Q 1 ), median (Q 2 ), upper quartile (Q 3 ), and largest observation. Example: x 1 ,... , xn ∼ U[0, 1], n = 100

1

0

1

Values

Column Number

Q 1

Q 2

Q 3

Largest observation

Smallest observation