






Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
A lecture note from Math 408 - Mathematical Statistics at the University of Southern California (USC), covering the topics of measures of location, including the arithmetic mean, median, trimmed mean, and M estimates, as well as measures of dispersion, such as the sample standard deviation, interquartile range, and median absolute deviation. The lecture also discusses the importance of these measures in summarizing data and their robustness to outliers.
Typology: Exams
1 / 12
This page cannot be seen from the preview
Don't miss anything!







Math 408 - Mathematical Statistics
April 29, 2013
Measures of Location I (^) Arithmetic Mean I (^) Median I (^) Trimmed Mean I (^) M Estimates Measures of Dispersion I (^) Sample Standard Deviation I (^) Interquartile Range (IQR) I (^) Median Absolute Deviation (MAD) Boxplots Summary
The most commonly used measure of location is the arithmetic mean,
x =
n
∑^ n
i=
xi
A common statistical model for the variability of a measurement process is the following: xi = μ + εi
xi is the value of the ith^ measurement μ is the true value of the quantity εi is the random error, εi ∼ N (0, σ^2 ) The arithmetic mean is then:
x = μ +
n
∑^ n
i=
εi ,
n
∑^ n
i=
εi ∼ N (0,
σ^2 n
The main drawback of the arithmetic mean is it is sensitive to outliers. If fact, by changing a single number, the arithmetic mean of a batch of numbers can be made arbitrary large or small. For this reason, measures of location that are robust, or insensitive to outliers, are important.
If the batch size is an odd number, x 1 ,... , x 2 n− 1 , then the median ˜x is defined to be the middle value of the ordered batch values:
x 1 ,... , x 2 n− 1 x(1) <... < x(2n−1), ˜x = x(n)
Important Remark: Moving the extreme observations does not affect the sample median at all, so the median is quite robust.
Let x 1 ,... , xn be a batch of numbers. It is easy to show that The mean x = arg min y ∈R
∑^ n
i=
(xi − y )^2
Outliers have a great effect on mean, since the deviation of y from xi is measured by the square of their difference. The median ˜x = arg min y ∈R
∑^ n
i=
|xi − y |
Here, large deviations are not weighted as heavily, that is exactly why the median is robust. In general, consider the following function:
f (y ) =
∑^ n
i=
Ψ(xi , y ),
where Ψ is called the weight function. M estimate is the minimizer of f :
y ∗^ = arg min y ∈R
∑^ n
i=
Ψ(xi , y )
A measure of dispersion, or scale, gives a numerical characteristic of the “scatteredness” of a batch of numbers. The most commonly used measure is the sample standard deviation s, which is the square root of the sample variance,
s =
n − 1
∑^ n
i=
(xi − x)^2
Q: Why (^) n−^11 instead of (^1) n?
A: s^2 is an unbiased estimate of the population variance σ^2. If n is large, then it makes little difference whether (^) n−^11 or (^1) n is used.
Like the mean, the standard deviation s is sensitive to outliers.
Let the ordered batch be {xi } = { 1 , 2 , 5 , 6 , 9 , 11 , 19 } Q 2 = ˜x = 6 Q 1 = 2 Q 3 = 11 IQR = 9 {|xi − x˜|} = { 5 , 4 , 1 , 0 , 3 , 5 , 13 }
MAD = 4
A boxplot is a graphical display of numerical data that is based on five-number summaries: the smallest observation, lower quartile (Q 1 ), median (Q 2 ), upper quartile (Q 3 ), and largest observation. Example: x 1 ,... , xn ∼ U[0, 1], n = 100
1
0
1
Values
Column Number
Q 1
Q 2
Q 3
Largest observation
Smallest observation