Statistical Methods for Bioscience II - Problem Set 8 | HORT 572 | Assignments Data Analysis & Statistical Methods

Stat/For/Hort 572 Larget April 21, 2008

Assignment #8 — Due Friday, April 25, 2008, by 4:00 P.M.

Turn in homework in lecture, discussion, or your TA’s mailbox (just inside the main entrance to MSC). Please

circle the discussion section you expect to attend to pick up this assignment.

311: Tues. 1:00–2:15 312: Wed. 2:30–3:45 313: Tue. 4:00–5:15

The first several questions revisits the data from the file larch.txt, which contains measurements from 26

twenty-four-year-old larch trees, an evergreen tree related to pines, native to central Europe, that drop their needles

every year. The observation number is in column 1 id. The explanatory variables are in columns 2–5 and are

the percent content of nitrogen nitro, the percent content of phosphorus phos, the percent content of potassium

potas, and the percent content of residual ash ash. The percent content of the minerals is determined from dried

needles from the tree. The response variable, tree height (height) is in column 6 and is measured in inches. The

objective of the study was to relate tree height to the mineral composition of the needles.

1. Fit a model for height with predictors an intercept, nitro,phos,potas,ash, and the interaction nitro:phos.

Report the estimated coefficients and the standard errors. Report the estimated error standard deviation.

(Note: The functions coef(),se.coef(), and sigma.hat() will extract the estimated coefficients, their

SEs, and the estimated residual standard deviation from a fitted regression model. The latter two functions

are in the arm library.)

2. This function in Rwill generate a fake data set using a model fitted with lm() using the fitted model from

part (a) as the truth from the same predictors. The first argument, x, should be the original data frame,

while the second argument is the fitted model.

make.fake = function(x,fit) {

# make sure that the arm library is loaded

library(arm)

# n is the number of observations

n = nrow(x)

# sigma.fit() is not in base R, but is in the arm library

# sigma is the estimated standard deviation of the error model

sigma = sigma.hat(fit)

# mu is the fitted value for each observation

mu = fitted(fit)

# this replaces the height variable in the data frame x

# this only modifies the copy of the data in this function;

# the original data frame is left unchanged

x$height = rnorm(n,mu,sigma)

# return the fake data set

return( x )

}

Enter the function into R. Use the function to compute five fake data sets. For each data set, fit a regression

model using the same predictors as the model in Problem 1. Plot residuals versus fitted values for both the

real data and for the fake data sets. Is the pattern or residuals for the real data similar to the pattern for

the fake data? Comment on what similarity or a lack of similarity implies about the goodness of fit of the

model.

Partial preview of the text

Download Statistical Methods for Bioscience II - Problem Set 8 | HORT 572 and more Assignments Data Analysis & Statistical Methods in PDF only on Docsity!

Stat/For/Hort 572 Larget April 21, 2008

Assignment #8 — Due Friday, April 25, 2008, by 4:00 P.M.

Turn in homework in lecture, discussion, or your TA’s mailbox (just inside the main entrance to MSC). Please circle the discussion section you expect to attend to pick up this assignment.

311: Tues. 1:00–2:15 312: Wed. 2:30–3:45 313: Tue. 4:00–5:

The first several questions revisits the data from the file larch.txt, which contains measurements from 26 twenty-four-year-old larch trees, an evergreen tree related to pines, native to central Europe, that drop their needles every year. The observation number is in column 1 id. The explanatory variables are in columns 2–5 and are the percent content of nitrogen nitro, the percent content of phosphorus phos, the percent content of potassium potas, and the percent content of residual ash ash. The percent content of the minerals is determined from dried needles from the tree. The response variable, tree height (height) is in column 6 and is measured in inches. The objective of the study was to relate tree height to the mineral composition of the needles.

Fit a model for height with predictors an intercept, nitro, phos, potas, ash, and the interaction nitro:phos. Report the estimated coefficients and the standard errors. Report the estimated error standard deviation. (Note: The functions coef(), se.coef(), and sigma.hat() will extract the estimated coefficients, their SEs, and the estimated residual standard deviation from a fitted regression model. The latter two functions are in the arm library.)
This function in R will generate a fake data set using a model fitted with lm() using the fitted model from part (a) as the truth from the same predictors. The first argument, x, should be the original data frame, while the second argument is the fitted model.

make.fake = function(x,fit) {

make sure that the arm library is loaded

library(arm)

n is the number of observations

n = nrow(x)

sigma.fit() is not in base R, but is in the arm library

sigma is the estimated standard deviation of the error model

sigma = sigma.hat(fit)

mu is the fitted value for each observation

mu = fitted(fit)

this replaces the height variable in the data frame x

this only modifies the copy of the data in this function;

the original data frame is left unchanged

x$height = rnorm(n,mu,sigma)

return the fake data set

return( x ) }

Enter the function into R. Use the function to compute five fake data sets. For each data set, fit a regression model using the same predictors as the model in Problem 1. Plot residuals versus fitted values for both the real data and for the fake data sets. Is the pattern or residuals for the real data similar to the pattern for the fake data? Comment on what similarity or a lack of similarity implies about the goodness of fit of the model.

Stat/For/Hort 572 Larget April 21, 2008

Use sim() to generate 1000 sets of plausible regression model coefficients for the data and model in Problem 1. Use the simulation to create a 95% prediction interval for the height of a larch tree in this population if the inputs are nitro=1.5, phos=0.16, potas=0.83, and ash=0.85.
An economically feasible way to increase biodiversity in the tropics is to plant a single species of trees in former plantations to form an overstory, and then to allow other species to naturally regenerate underneath the planted trees. In a study, there are three former plantations (sites). The six species of trees for the overstory are planted in randomly selected plots at each site with four plots per species per site. There are a total of 72 = 3 × 6 × 4 plots. In each of these plots, the percent cover by the canopy is measured. After nine years of growth, the response is the number of species of woody plants found within each plot in a randomly selected 4 × 4 meter region. The data set plantation.txt contains this data. Use Poisson regression to fit a model predicting the number of woody species in the understory for each plot using site, overstory treatment, and canopy as inputs. Report the estimated coefficients of the model.
How many zeros are in the real data set for understory? Use fake data simulation to simulate 1000 data sets from the fitted Poisson model. For each data set, compute the number of zeros. Compare the observed number of zero counts with the distribution of fake data zero counts. Interpret this comparison.
Fit a quasi-Poisson model to this data. What is the estimated overdispersion parameter.
Use fake-data simulation to siumulate 1000 data sets according to this model (using rnbinom() as shown in lecture 20). Compare the observed number of zeros in the real data to the distribution in this simulation.

Work to do, but not turn in.

Read Chapters 12–13 of the textbook.

Statistical Methods for Bioscience II - Problem Set 8 | HORT 572, Assignments of Data Analysis & Statistical Methods

Related documents

Partial preview of the text

Download Statistical Methods for Bioscience II - Problem Set 8 | HORT 572 and more Assignments Data Analysis & Statistical Methods in PDF only on Docsity!

make sure that the arm library is loaded

n is the number of observations

sigma.fit() is not in base R, but is in the arm library

sigma is the estimated standard deviation of the error model

mu is the fitted value for each observation

this replaces the height variable in the data frame x

this only modifies the copy of the data in this function;

the original data frame is left unchanged

return the fake data set