



Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Material Type: Notes; Class: Linear Models; Subject: Mathematics; University: University of Utah; Term: Fall 2004;
Typology: Study notes
1 / 5
This page cannot be seen from the preview
Don't miss anything!




DAVAR KHOSHNEVISAN
If ε ∼ Nn( 0 , σ^2 In) then one would like to think that the histogram of the ̂ εi’s should look like a normal pdf with mean 0 and variance σ^2 (why?). How close is close? It helps to think more generally. Consider a sample U 1 ,... , Un (e.g., Ui = ε̂i). We wish to know where the Ui’s are coming from a normal distribution. Again, the first thing to do is to plot the histogram. In R you type,
hist(u,nclass=n)
where u denotes the vector of the samples U 1 ,... , Un and n denotes the number of bins in the histogram. For instance, consider the following exam data: 16.8 9.2 0.0 17.6 15.2 0.0 0.0 10.4 10.4 14.0 11.2 13.6 12. 14.8 13.2 17.6 9.2 7.6 9.2 14.4 14.8 15.6 14.4 4.4 14.0 14.4 0. 0.0 10.8 16.8 0.0 15.2 12.8 14.4 14.0 17.2 0.0 14.4 17.2 0.0 0. 0.0 14.0 5.6 0.0 0.0 13.2 17.6 16.0 16.0 0.0 12.0 0.0 13.6 16. 8.4 11.6 0.0 10.4 0.0 14.4 0.0 18.4 17.2 14.8 16.0 16.0 0.0 10. 13.6 12.0 15.
The command f1.dat,hist(nclass=15) produces Figure 1(a).^1 Try this for different values of nclass to see what types of hitograms you can obtain. You should always ask, “which one represents the truth the best”? Is there a unique answer? Now the data U 1 ,... , Un is probably not coming from a normal distribu- tion if the histogram does not have the “right” shape. Ideally, it would be symmetric, and the tails of the distribution taper off rapidly. In Figure 1(a), there were many students who did not take the exam in question. They received a ‘0’ but this grade should probably not contribute to our knowledge of the distribution of all such grades. Figure 1(b) shows
Date: September 1, 2004. (^1) You can obtain this data freely from the website below: http://www.math.utah.edu/˜davar/math6010/2004/notes/f1.dat. 1
2 DAVAR KHOSHNEVISAN
Histogram of f
f
Frequency
0 5 10 15
0
5
10
15
(a) Grades
Histogram of f1.censored
f1.censored
Frequency
5 10 15
0
2
4
6
8
(b) Censored Grades
4 DAVAR KHOSHNEVISAN
−2 −1 0 1 2
0
5
10
15
Normal Q−Q Plot
Theoretical Quantiles
Sample Quantiles
(c) QQ-plot of grades
−2 −1 0 1 2
4
6
8
10
12
14
16
18
Normal Q−Q Plot
Theoretical Quantiles
Sample Quantiles
(d) QQ-plot of censored grades
ASSESSING NORMALITY 5
This creates two vectors: V$x and V$y. The first contains the values of all qj ’s, and the second all of the U(j)’s. So now you can compute the correlation coefficient of the qq-plot by typing:
V = qqnorm(u, plot = FALSE) cor(V$x, V$y).
If you do this for the qq-plot of the grade data, then you will find a correlation of ≈ 0 .910. After censoring out the no-show exams, we obtain a correlation of ≈ 0 .971. This produces a noticeable difference, and shows that the grades are indeed normal. In fact, one can analyse this procedure statistically, as we shall do later on.