

Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Material Type: Assignment; Professor: Ane; Class: Statistical Methods for Bioscience II; Subject: HORTICULTURE; University: University of Wisconsin - Madison; Term: Spring 2009;
Typology: Assignments
1 / 2
This page cannot be seen from the preview
Don't miss anything!


Stat/F&W/Hort 572 Ane February 12, 2009
Assignment #4 — Due Friday, Feb. 20, 2009, by 4pm
Turn in lecture, discussion, or to your TA’s mailbox. Please circle the discussion section you expect to pick up this assignment: 311 312 313 314
Purpose: Get experience with logistic regression.
A biologist interested in crustaceans sought evidence of a geographic trend in allelic distribution in the gene mannose-6-phosphate isomerase (Mpi) in populations of the amphipod crustacean Megalorchestia californiana located along the Pacific coast. In eight populations ranging from Santa Barbara, California in the south to Port Townsend, Washington in the north, the biologist genotyped individual crustaceans at the Mpi locus. Sample sizes ranged from 30 to over 2000. Two alleles, Mpi^90 and Mpi^100 were prevalent. If latitude is helpful in predicting allelic frequency of one of the alleles, say Mpi^90 , this would be evidence consistent with a story of differential selection at this locus based on an environmental factor associated with latitude. Latitude is in degrees, north of the equator. The data is in the file crustacean.txt and is also shown below. location latitude Mpi90 Mpi Port Townsend, WA 48.1 47 139 Neskowin, OR 45.2 177 241 Siuslaw R., OR 44.0 1087 1183 Umpqua R., OR 43.7 187 175 Coos Bay, OR 43.5 397 671 San Francisco, CA 37.8 40 14 Carmel, CA 36.6 39 17 Santa Barbara, CA 34.3 30 0
(a) Read the data into R, create a data frame from this data, then add a column with the total sample size, and another column with the proportion of Mpi^90. Plot these proportions versus latitude. Explain why logistic regression is an appropriate model to address this question. (Include the plot with your solution.)
(b) Fit a logistic regression model to this data and report the coefficients. Is the latitude effect significant?
(c) On a single plot, include
Stat/F&W/Hort 572 Ane February 12, 2009
The function lines(x,y) will create a curve: it will join the points with coordinates (x, y) by line segments.
(d) According to the model, at what latitude would we expect a 50/50 distribution of the two Mpi alleles?
(e) Use the model to predict the proportion of Mpi^90 alleles in a population at latitude 40.0. You are encouraged to check your result with the predict function in R, but you also need to make this prediction by hand, using the coefficients from the fitted model.
(f) Based on residual plots, evaluate the presence of influential observations. Why are these observations in- fluential? Is latitude effect and its significance changed when influential observations are removed? Hint: to remove observations from a data set, you may use the subset function. For instance, to consider only observations with latitude in between 40 and 50, one could do this:
subset( crustacean , latitude<50 & latitude>40)
(g) Based on the deviance, is the model fitting this data well? (You might think about what corrective action would be appropriate if there is lack of fit, although you do not need to).
Reading: Chapter 5 for logistic regression. For model selection, suggested (but not required) reading is Chapter 8 in Linear models with R by Julian Faraway.