Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Data Analysis Simulation, Exercises - Engineering, Exercises of Advanced Data Analysis

Carnegie Mellon University (CMU)Advanced Data Analysis

Data Analysis Simulation, Exercises - Engineering - Prof. Cosma Shalizi, Advanced Data Analysis, Diabetes

Typology: Exercises

2010/2011

Uploaded on 11/03/2011

bridge 🇺🇸

4.9

(13)

287 documents

1 / 2

This page cannot be seen from the preview

Don't miss anything!

Homework 7: Diabetes

36-402, Advanced Data Analysis

Due at the start of class, 22 March 2011

A classic data set for classification problems, logistic regression and related

methods comes from a study of the correlates of diabetes among the Pima

Indians of Arizona, collected as part of a long-term study to understand why

the Pima, like many other Native American groups, suffer from a much higher

rate of diabetes than other populations in the US. (For background on the

study, and the issue, see http://diabetes.niddk.nih.gov/dm/pubs/pima/.)

Our version of the data is the data set pima in the package faraway.1It contains

information of 768 adult Pima women, some but not all of whom have diabetes.

See help(pima) for a description of the variables. Note that the column named

diabetes indicates how much of a history of diabetes there was in the woman’s

family; it is the last column, test, which indicates whether the or not the woman

herself is diabetic.

1. (10 points) Make graphic and numerical summaries of the data. If there

are any obvious irregularities in the data, describe them, say why you

think they are irregularities, and remove them as appropriate.

2. (20 points) Fit a logistic regression model to predict diabetes, using all

the other variables as inputs. What are the estimated coefficients?

3. (10 points) What is the probability of having diabetes for a woman who

has been pregnant twice, has a glucose concentration of 99, a diastolic

pressure of 64, 22 mm of tricep thickness, an insulin level of 76, a BMI of

26, a diabetes “pedigree function” of 0.25, and is 30 years old. Give a 95%

confidence interval for this prediction, assuming the model is correctly

specified.

4. (10 points) How do the odds of having diabetes change for a woman who

moves from the third quartile of the BMI distribution to the first quar-

tile, with all else held constant? Give a 95% confidence interval for the

difference in odds, assuming the model is correct specified.

5. (20 points) Do women with diabetes have higher diastolic blood pressure

than women without diabetes? Is the blood pressure coefficient signifi-

cant in your model? Explain why the answers to these two questions are

actually compatible.

1This homework is in fact based on problem 3 in chapter 2 of Faraway’s textbook.

1

Discover Exercises of Advanced Data Analysis Carnegie Mellon University (CMU)

Partial preview of the text

Download Data Analysis Simulation, Exercises - Engineering and more Exercises Advanced Data Analysis in PDF only on Docsity!

Homework 7: Diabetes

36-402, Advanced Data Analysis

Due at the start of class, 22 March 2011

A classic data set for classification problems, logistic regression and related methods comes from a study of the correlates of diabetes among the Pima Indians of Arizona, collected as part of a long-term study to understand why the Pima, like many other Native American groups, suffer from a much higher rate of diabetes than other populations in the US. (For background on the study, and the issue, see http://diabetes.niddk.nih.gov/dm/pubs/pima/.) Our version of the data is the data set pima in the package faraway.^1 It contains information of 768 adult Pima women, some but not all of whom have diabetes. See help(pima) for a description of the variables. Note that the column named diabetes indicates how much of a history of diabetes there was in the woman’s family; it is the last column, test, which indicates whether the or not the woman herself is diabetic.

(10 points) Make graphic and numerical summaries of the data. If there are any obvious irregularities in the data, describe them, say why you think they are irregularities, and remove them as appropriate.
(20 points) Fit a logistic regression model to predict diabetes, using all the other variables as inputs. What are the estimated coefficients?
(10 points) What is the probability of having diabetes for a woman who has been pregnant twice, has a glucose concentration of 99, a diastolic pressure of 64, 22 mm of tricep thickness, an insulin level of 76, a BMI of 26, a diabetes “pedigree function” of 0.25, and is 30 years old. Give a 95% confidence interval for this prediction, assuming the model is correctly specified.
(10 points) How do the odds of having diabetes change for a woman who moves from the third quartile of the BMI distribution to the first quar- tile, with all else held constant? Give a 95% confidence interval for the difference in odds, assuming the model is correct specified.
(20 points) Do women with diabetes have higher diastolic blood pressure than women without diabetes? Is the blood pressure coefficient signifi- cant in your model? Explain why the answers to these two questions are actually compatible. (^1) This homework is in fact based on problem 3 in chapter 2 of Faraway’s textbook.

(10 points) Describe how you can check whether this model fits the data.
(20 points) Does the model fit the data?
(10 points, extra credit) Use bootstrapping to find confidence intervals for the coefficients from question (2), the predicted probability in question (3), and the difference in odds in question (4). Compare them to your earlier answers, and explain how this relates to your findings in question (7).

Data Analysis Simulation, Exercises - Engineering, Exercises of Advanced Data Analysis

Related documents

Partial preview of the text

Download Data Analysis Simulation, Exercises - Engineering and more Exercises Advanced Data Analysis in PDF only on Docsity!

Homework 7: Diabetes

36-402, Advanced Data Analysis

Due at the start of class, 22 March 2011