





Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Solutions to homework assignment 3 for the course 36-402, advanced data analysis, in spring 2011. It focuses on analyzing the geyser data for heteroskedasticity and using kernel regression to estimate variance and waiting times. The assignment includes plots, regression models, and interpretations of results.
Typology: Exercises
1 / 9
This page cannot be seen from the preview
Don't miss anything!






l l
l
l l l
l
l l
l
l
l
l
l
l
l l
l
l
l l
l
l
l
l
l
l
l
l l l
l l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l l l
l
l
l l
l
l
l l
l l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l l
l
l
l
l
l
l
l
l
l
l l l
l
l l l
l l
l
l
l
l l l l l l
l l
l llll
l
l
l
l
l
l
l
l
l
l ll l l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l l
l
l
l
ll l
l
l
l
l
l
l
l
l
l l l l
l
l
l
l
l
l
l
l l l
l
l
l
l
l
l
l
l
l
l l
l
l
l ll l
l l
l
l
l
l
l
l
l
l l l
l
l
l
l
l
l l
l
l l l
l
l
l
l
l
l
l
l
l
l
l l l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l l llllll
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
Waiting time as function of geyser duration
Figure 1:
lm.4 = lm(waiting ~ duration, data = geyser, weights = 1/reg.var3$mean)
(Intercept) 99.309856Estimate Std. Error 1. duration -7.800326 0.
(Intercept) 98.896429Estimate Std. Error 1. duration -7.808154 0. The difference between the two coefficients for duration is only about0.008, while the corresponding standard errors are at least 50 times as large, indicating that the difference in slopes is not statistically significant.Even if this difference were statistically significant, the difference is too small to matter much, at least from the perspective of a casual touristwho wants to know when the next eruption will occur. The difference between intercepts is quite a bit larger, at about 0.4, butagain the standard errors are at least several times larger.
l l
l
l l l
l
l l
l
l
l
l
l
l
l l
l
l
l l
l
l
l
l
l
l
l
l l l
l l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l l l
l
l
l l
l
l
l l
l l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l l
l
l
l
l
l
l
l
l
l
l l l
l
l l l
l l
l
l
l
l l l l l l
l l
l llll
l
l
l
l
l
l
l
l
l
l ll l l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l l
l
l
l
ll l
l
l
l
l
l
l
l
l
l l l l
l
l
l
l
l
l
l
l l l
l
l
l
l
l
l
l
l
l
l l
l
l
l ll l
l l
l
l
l
l
l
l
l
l l l
l
l
l
l
l
l l
l
l l l
l
l
l
l
l
l
l
l
l
l
l l l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l l llllll
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
Waiting time as function of geyser durationwith a kernel regression curve
Figure 3:
Estimated variance of waiting given duration:Solid black is for unweighted linear regression, and dashed blue is for kernel regression
Figure 4:
Conditional Density
Conditional density estimate for waiting given duration
Figure 5: