Download Homework #4 - Applied Regression and Design | STAT 425 and more Assignments Statistics in PDF only on Docsity! STAT 425 HW #4 Due Wednesday, November 19, 2003 You can turn it in to my office, 116B IH, or mailbox in 101 IH. 1. Do just one of (a) or (b). (a) Suppose the data are x1 = 1, x2 = 2, x3 = 6. Find the exact bootstrap distribution of X ∗ , and its mean. Is E[X ∗ ] = x? (b) For data x1, . . . , xn, prove that the mean and variance of the bootstrap X ∗ are x and s2/n, respectively. What is s2? 2. Consider the little data set where g = (red, red, red, blue, blue, blue) and the predictor is x = (1, 2, 4, 3, 5, 6). In boosting stumps, the first stump classifies the observations as red if x < 3.5 and blue if x > 3.5. Find err1, α1 and the new weights w1. What is the weighted misclassification error of the first stump using the new weights? You are free to work on the next two problems in groups of up to four people. In fact, that is a good idea. If you do, turn this problem in separately, with the names of the people contributing. There’s a link to the data for these problems on the course web page. 3. The Boston Housing Data (http://lib.stat.cmu.edu/datasets/boston) has data on 506 regions in the Boston area, the variables being CRIM per capita crime rate by town ZN proportion of residential land zoned for lots over 25,000 sq.ft. INDUS proportion of non-retail business acres per town CHAS Charles River dummy variable (= 1 if tract bounds river; 0 otherwise) NOX nitric oxides concentration (parts per 10 million) RM average number of rooms per dwelling AGE proportion of owner-occupied units built prior to 1940 DIS weighted distances to five Boston employment centres RAD index of accessibility to radial highways TAX full-value property-tax rate per $10,000 PTRATIO pupil-teacher ratio by town B 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town LSTAT % lower status of the population MEDV Median value of owner-occupied homes in $1000’s The goal is to be able to predict median value of homes (MEDV) from the other variables. It is in the data set boston. Use regular linear regression (including subset selection), projection pursuit regression, and generalized additive models to fit the data. Compare the models with leave-one-out cross-validation. Which is/are best? 1