Assessing Normality - Lecture Notes | MATH 6010, Study notes of Mathematics

Material Type: Notes; Class: Linear Models; Subject: Mathematics; University: University of Utah; Term: Fall 2004;

Typology: Study notes

Pre 2010

Uploaded on 08/30/2009

koofers-user-o0h
koofers-user-o0h 🇺🇸

9 documents

1 / 5

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
ASSESSING NORMALITY
DAVAR KHOSHNEVISAN
1. Histograms
Consider the linear model Y= +ε. The pressing question is, “is it
true that εNn(0
2In)”?
To answer this, consider the “residuals,”
b
ε=YXb
β.
If εNn(0
2In) then one would like to think that the histogram of the
bεi’s should look like a normal pdf with mean 0 and variance σ2(why?). How
close is close? It helps to think more generally.
Consider a sample U1,...,U
n(e.g., Ui=bεi). We wish to know where the
Ui’s are coming from a normal distribution. Again, the first thing to do is
to plot the histogram. In Ryou type,
hist(u,nclass=n)
where udenotes the vector of the samples U1,...,U
nand ndenotes the
number of bins in the histogram.
For instance, consider the following exam data:
16.8 9.2 0.0 17.6 15.2 0.0 0.0 10.4 10.4 14.0 11.2 13.6 12.4
14.8 13.2 17.6 9.2 7.6 9.2 14.4 14.8 15.6 14.4 4.4 14.0 14.4 0.0
0.0 10.8 16.8 0.0 15.2 12.8 14.4 14.0 17.2 0.0 14.4 17.2 0.0 0.0
0.0 14.0 5.6 0.0 0.0 13.2 17.6 16.0 16.0 0.0 12.0 0.0 13.6 16.0
8.4 11.6 0.0 10.4 0.0 14.4 0.0 18.4 17.2 14.8 16.0 16.0 0.0 10.0
13.6 12.0 15.2
The command f1.dat,hist(nclass=15) produces Figure 1(a).1
Try this for different values of nclass to see what types of hitograms you
can obtain. You should always ask, “which one represents the truth the
best”? Is there a unique answer?
Now the data U1,...,U
nis probably not coming from a normal distribu-
tion if the histogram does not have the “right” shape. Ideally, it would be
symmetric, and the tails of the distribution taper off rapidly.
In Figure 1(a), there were many students who did not take the exam in
question. They received a ‘0’ but this grade should probably not contribute
to our knowledge of the distribution of all such grades. Figure 1(b) shows
Date: September 1, 2004.
1You can obtain this data freely from the website b elow:
http://www.math.utah.edu/˜davar/math6010/2004/notes/f1.dat.
1
pf3
pf4
pf5

Partial preview of the text

Download Assessing Normality - Lecture Notes | MATH 6010 and more Study notes Mathematics in PDF only on Docsity!

ASSESSING NORMALITY

DAVAR KHOSHNEVISAN

  1. Histograms Consider the linear model Y = Xβ + ε. The pressing question is, “is it true that ε ∼ Nn( 0 , σ^2 In)”? To answer this, consider the “residuals,” ̂ ε = Y − X β̂.

If ε ∼ Nn( 0 , σ^2 In) then one would like to think that the histogram of the ̂ εi’s should look like a normal pdf with mean 0 and variance σ^2 (why?). How close is close? It helps to think more generally. Consider a sample U 1 ,... , Un (e.g., Ui = ε̂i). We wish to know where the Ui’s are coming from a normal distribution. Again, the first thing to do is to plot the histogram. In R you type,

hist(u,nclass=n)

where u denotes the vector of the samples U 1 ,... , Un and n denotes the number of bins in the histogram. For instance, consider the following exam data: 16.8 9.2 0.0 17.6 15.2 0.0 0.0 10.4 10.4 14.0 11.2 13.6 12. 14.8 13.2 17.6 9.2 7.6 9.2 14.4 14.8 15.6 14.4 4.4 14.0 14.4 0. 0.0 10.8 16.8 0.0 15.2 12.8 14.4 14.0 17.2 0.0 14.4 17.2 0.0 0. 0.0 14.0 5.6 0.0 0.0 13.2 17.6 16.0 16.0 0.0 12.0 0.0 13.6 16. 8.4 11.6 0.0 10.4 0.0 14.4 0.0 18.4 17.2 14.8 16.0 16.0 0.0 10. 13.6 12.0 15.

The command f1.dat,hist(nclass=15) produces Figure 1(a).^1 Try this for different values of nclass to see what types of hitograms you can obtain. You should always ask, “which one represents the truth the best”? Is there a unique answer? Now the data U 1 ,... , Un is probably not coming from a normal distribu- tion if the histogram does not have the “right” shape. Ideally, it would be symmetric, and the tails of the distribution taper off rapidly. In Figure 1(a), there were many students who did not take the exam in question. They received a ‘0’ but this grade should probably not contribute to our knowledge of the distribution of all such grades. Figure 1(b) shows

Date: September 1, 2004. (^1) You can obtain this data freely from the website below: http://www.math.utah.edu/˜davar/math6010/2004/notes/f1.dat. 1

2 DAVAR KHOSHNEVISAN

Histogram of f

f

Frequency

0 5 10 15

0

5

10

15

(a) Grades

Histogram of f1.censored

f1.censored

Frequency

5 10 15

0

2

4

6

8

(b) Censored Grades

4 DAVAR KHOSHNEVISAN

−2 −1 0 1 2

0

5

10

15

Normal Q−Q Plot

Theoretical Quantiles

Sample Quantiles

(c) QQ-plot of grades

−2 −1 0 1 2

4

6

8

10

12

14

16

18

Normal Q−Q Plot

Theoretical Quantiles

Sample Quantiles

(d) QQ-plot of censored grades

ASSESSING NORMALITY 5

This creates two vectors: V$x and V$y. The first contains the values of all qj ’s, and the second all of the U(j)’s. So now you can compute the correlation coefficient of the qq-plot by typing:

V = qqnorm(u, plot = FALSE) cor(V$x, V$y).

If you do this for the qq-plot of the grade data, then you will find a correlation of ≈ 0 .910. After censoring out the no-show exams, we obtain a correlation of ≈ 0 .971. This produces a noticeable difference, and shows that the grades are indeed normal. In fact, one can analyse this procedure statistically, as we shall do later on.