

Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Data Analysis Simulation, Exercises - Engineering - Prof. Cosma Shalizi, Advanced Data Analysis, Diabetes
Typology: Exercises
1 / 2
This page cannot be seen from the preview
Don't miss anything!


A classic data set for classification problems, logistic regression and related methods comes from a study of the correlates of diabetes among the Pima Indians of Arizona, collected as part of a long-term study to understand why the Pima, like many other Native American groups, suffer from a much higher rate of diabetes than other populations in the US. (For background on the study, and the issue, see http://diabetes.niddk.nih.gov/dm/pubs/pima/.) Our version of the data is the data set pima in the package faraway.^1 It contains information of 768 adult Pima women, some but not all of whom have diabetes. See help(pima) for a description of the variables. Note that the column named diabetes indicates how much of a history of diabetes there was in the woman’s family; it is the last column, test, which indicates whether the or not the woman herself is diabetic.