Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Stat 430: Analyzing Mercury in Fish & Creating Regression Model for Medieval Cathedrals, Assignments of Statistics

University of Maryland Statistics

Instructions for a statistics problem set consisting of two parts. In the first part, students are required to analyze a dataset on mercury contamination in fish using sas, including creating scatterplots, applying log transformations, identifying outliers, and fitting regression models. In the second part, students are asked to create a large sas dataset using pseudo-random monte carlo simulation and fit a multiple regression model for the heights and lengths of medieval english cathedrals, considering various predictor combinations and interactions. The objective is to demonstrate the use of statistical tools to build accurate models.

Typology: Assignments

Pre 2010

Uploaded on 07/30/2009

koofers-user-rlt 🇺🇸

10 documents

1 / 3

This page cannot be seen from the preview

Don't miss anything!

Stat 430, Problem Set 6, Due Friday April 24, 2009

For this assignment, provide the SAS program code used as well as the

edited SAS output you produced to answer the questions. You may annotate

your SAS output, in handwritten form if you like, but verbally explain how

your output answers the questions asked, and please do not hand in data or

printed output which is not specifically requested and does not figure in your

answers to questions.

(I). The dataset bass contains data from a study of Mercury contamina-

tion in fish that live in Floridian lakes.

(a). Make a scatterplot of average mercury contamination (AvgMercury)

as a function of alkalinity.

(b). Use the log transform on the response, and fit a regression line.

Prepare a scatterplot with regression line, and a residual plot as well.

(c). Construct the Cook’s distance measure from the data. Remove the

outliers that you identify from the residual plot. Do the outliers have the

largest Cook’s distances?

(d). After removing cases 36 and 52, you will see that there is evidence

of another outlier. Keep going until you have deleted everything outlier-like

(i.e. cases 36, 52, 40, 3, and 38). Fit a regression model to the remaining

points and identify the changes in the regression coefficient estimates and

in adj R-sq between your model with all the outliers and your model with

none of them.

(e). Construct the 95% prediction interval at the value of the predictor

near the outlier, using SAS to generate the intervals before and after removing

the outlier. Does the outlier appear to have a great effect on this interval?

Based on your result, decide whether or not you would consider the outlier

an influential case.

(f). Do a scatterplot of the response versus the log(predictor). Why would

this produce prediction intervals that would be difficult to trust? [Note: you

may be able to solve this one by observation, without running the regression

program again].

(g). Based on the results from your analysis, would you hypothesize

that acid rain (which decreases alkalinity) is likely to improve or make worse

the average levels of mercury contamination in fish? Would your conclusion

from this analysis alone be sufficient to have NOAA sending trucks to dump

calcium chloride into Florida lakes? Explain briefly.

1

Discover Assignments of Statistics University of Maryland

Partial preview of the text

Download Stat 430: Analyzing Mercury in Fish & Creating Regression Model for Medieval Cathedrals and more Assignments Statistics in PDF only on Docsity!

Stat 430, Problem Set 6, Due Friday April 24, 2009

For this assignment, provide the SAS program code used as well as the edited SAS output you produced to answer the questions. You may annotate your SAS output, in handwritten form if you like, but verbally explain how your output answers the questions asked, and please do not hand in data or printed output which is not specifically requested and does not figure in your answers to questions.

(I). The dataset bass contains data from a study of Mercury contamina- tion in fish that live in Floridian lakes. (a). Make a scatterplot of average mercury contamination (AvgMercury) as a function of alkalinity. (b). Use the log transform on the response, and fit a regression line. Prepare a scatterplot with regression line, and a residual plot as well. (c). Construct the Cook’s distance measure from the data. Remove the outliers that you identify from the residual plot. Do the outliers have the largest Cook’s distances? (d). After removing cases 36 and 52, you will see that there is evidence of another outlier. Keep going until you have deleted everything outlier-like (i.e. cases 36, 52, 40, 3, and 38). Fit a regression model to the remaining points and identify the changes in the regression coefficient estimates and in adj R-sq between your model with all the outliers and your model with none of them. (e). Construct the 95% prediction interval at the value of the predictor near the outlier, using SAS to generate the intervals before and after removing the outlier. Does the outlier appear to have a great effect on this interval? Based on your result, decide whether or not you would consider the outlier an influential case. (f). Do a scatterplot of the response versus the log(predictor). Why would this produce prediction intervals that would be difficult to trust? [Note: you may be able to solve this one by observation, without running the regression program again]. (g). Based on the results from your analysis, would you hypothesize that acid rain (which decreases alkalinity) is likely to improve or make worse the average levels of mercury contamination in fish? Would your conclusion from this analysis alone be sufficient to have NOAA sending trucks to dump calcium chloride into Florida lakes? Explain briefly.

(II). (a) Create (and save!) by pseudo-random Monte Carlo simulation a large (n=1000) SAS dataset with the columns Y, X, Z defined as follows: X ∼ Uniform[0, 5], Z ∼ Binom(1, 0 .5) are independent random variables in each row, and if V denotes another independent random variable with t 3 distribution, then

Y = 1.5 + 2 ∗ X − 0. 4 ∗ X^2 + 7 ∗ Z −. 1 ∗ X ∗ Z + 2 ∗ V

Hint: if you multiply c times a Uniform[0, 1] random variable, you get a Uniform[0, c] random variable; and the easiest way to generate a t 3 random variable is to generate four independent N (0, 1) random variables W√ 1 , W 2 , W 2 , W 4 using the SAS function RANNOR and then define V = W 1 ∗

3 /

√ W 22 + W 32 + W 42. (b) Fit a simple linear regression model of Y on X to your dataset. Ex- amine a residuals plot or (use SAS-generated prediction intervals and/or stu- dentized residuals or other statistical tools in SAS to show how you would be guided in this setting to augment the model by including a quadratic (X^2 ) term in the model and also a term involving Z.

(c) Now fit the multiple regression model with Y modelled in terms of X, X^2 , Z. Note that in this problem, we know in advance what the cor- rect model should be. The objective is to show which tools get us to build the correct model. What tools would you use to examine whether a fourth predictor variable X ∗ Z will actually improve the fit of the model.

(d) For the multiple regression model based on the correct model (Y regressed on X, X^2 , Z, X ∗ Z), do the residuals look patternless (plottedd both against X and against the predictor Yˆ? Plot a histogram of the residuals from the final fitted model and examine them for normality. (Use histograms with over-plotted normal densities with same mean and variance, or QQplot.) Do the residuals look normal?

(III). The dataset cathedrals contains a list of the heights and lengths of a selection of medieval English cathedrals. Of these, the Romanesque cathedrals are indexed by style=0 and the Gothic by style=1. Find the best model you can to describe height in terms of style and length. (Choose a reasonable criterion for this !) You may want to consider the following:

transformations do not seem to be useful,

Stat 430: Analyzing Mercury in Fish & Creating Regression Model for Medieval Cathedrals, Assignments of Statistics

Related documents

Partial preview of the text

Download Stat 430: Analyzing Mercury in Fish & Creating Regression Model for Medieval Cathedrals and more Assignments Statistics in PDF only on Docsity!

Stat 430, Problem Set 6, Due Friday April 24, 2009