Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Regression Analysis - Laboratory Review 8 | STATS 13, Lab Reports of Statistics

University of California - Los Angeles (UCLA)Statistics

Material Type: Lab; Class: Introduction to Statistical Methods for Life and Health Sciences; Subject: Statistics; University: University of California - Los Angeles; Term: Unknown 1989;

Typology: Lab Reports

Pre 2010

Uploaded on 08/21/2009

koofers-user-ks5 🇺🇸

9 documents

1 / 6

This page cannot be seen from the preview

Don't miss anything!

Lab 8: Regression analysis

Due with your final writeup: 1 Question, 1 Bonus

In this lab, we will examine the data on mercury contamination considered in lecture.

You can think of it as a prelude to your final, which will largely concern regression

analysis. Perhaps it is more precise to say that the final, with its focus on regression,

will let us explore the big themes addressed during the quarter: Graphical and

numerical summaries, formulating and testing hypotheses, sampling variablity and

confidence intervals. To get us started, we will spend this Lab session learning about

how R implements the simple linear model.

Getting Started

As usual, we begin this lab by loading data into your R session. This time, however,

our data are not stored in R’s special format, but rather as comma-separated values

(CSV). Programs like Excel, for example, can read and write data in this format.

After weeks of the usual routine, we’ve decided to publish lab materials this way so

that you see how to bring your own data into R.

The data on fish contamination is linked to the course home page, or is available

directly at

http://www.stat.ucla.edu/~cocteau/stat13/data/fish.csv

In eiter case, use a browser to download the file, and save it in the “working direc-

tory” for your R session. The command

getwd()

should return the folder where R will look for files to load. Save fish.csv there.

(We have not talked about this much, but when you start using R outside of this

class, you will inevitably want to save your work in different folders, giving each new

project its own folder. In general, you can specify a new working directory with the

command setwd, which takes the location of your project’s folder as an argument.)

Discover Lab Reports of Statistics University of California - Los Angeles (UCLA)

Partial preview of the text

Download Regression Analysis - Laboratory Review 8 | STATS 13 and more Lab Reports Statistics in PDF only on Docsity!

Lab 8: Regression analysis

Due with your final writeup: 1 Question, 1 Bonus

In this lab, we will examine the data on mercury contamination considered in lecture. You can think of it as a prelude to your final, which will largely concern regression analysis. Perhaps it is more precise to say that the final, with its focus on regression, will let us explore the big themes addressed during the quarter: Graphical and numerical summaries, formulating and testing hypotheses, sampling variablity and confidence intervals. To get us started, we will spend this Lab session learning about how R implements the simple linear model.

Getting Started

As usual, we begin this lab by loading data into your R session. This time, however, our data are not stored in R’s special format, but rather as comma-separated values (CSV). Programs like Excel, for example, can read and write data in this format. After weeks of the usual routine, we’ve decided to publish lab materials this way so that you see how to bring your own data into R.

The data on fish contamination is linked to the course home page, or is available directly at

http://www.stat.ucla.edu/~cocteau/stat13/data/fish.csv

In eiter case, use a browser to download the file, and save it in the “working direc- tory” for your R session. The command

getwd()

should return the folder where R will look for files to load. Save fish.csv there.

(We have not talked about this much, but when you start using R outside of this class, you will inevitably want to save your work in different folders, giving each new project its own folder. In general, you can specify a new working directory with the command setwd, which takes the location of your project’s folder as an argument.)

Once you have stored fish.csv in your current working directory, you can read the data into R with the following command.

fish = read.csv("fish.csv")

R has a number of functions for handling data with different formats. The function read.csv is specially designed for files of comma separated values. After reading the contents of the file, the command above saves the data in fish.

You can also read in the file without downloading it. Instead you could create a url from the address of the file.

fish = read.csv( url("http://www.stat.ucla.edu/~cocteau/stat13/data/fish.csv"))

Whether you go the url route or you download the file to your computer, the following command

ls()

will give you a list of objects that should include the dataset fish. Recall from lecture that these data were collected to study the extent of mercury concentration in two rivers in North Carolina. Researchers considered only largemouth bass; and each row of this dataset refers to a fish taken from one of two rivers, the Lumber or the Waccamaw.

dim(fish)

You will see that the data consists of 171 rows (fish) and 5 columns. Data for the first five fish can be shown with the command

fish[1:5,]

which should produce the following output.

river station length weight mercury 1 0 0 47.0 1616 1. 2 0 0 48.7 1862 1. 3 0 0 55.7 2855 1. 4 0 0 45.2 1199 0. 5 0 0 44.7 1320 0.

We observed in class that the data have a positive trend in the sense that longer fish tend to have higher mercury content. The relationship is somewhat noisy, however. Still, we decided to posit a population model of the form

(mercury) = β 0 + β 1 (length) + (error)

that is, the mercury content of a fish is described by linear model. Given a fish’s length, its mercury level is determined by a point on the line plus some error; a fish’s length is multiplied by the slope (β 1 ), we add the intercept β 0 to the result, and finally an error term is introduced.

In class we examined the use of least squares as a way of estimating the slope and intercept of the population model from a sample of fish. In R, this is done with the command

fit = lm(mercury~length,data=waccamaw) summary(fit)

Here, the command lm refers to a “linear model.” The expression inside the paren- theses, mercury~length, is a formula in R. It specifies a statistical model, in this case the one given above: that mercury content depends linearly on a fish’s length. Notice that this expression is missing any reference to the population parameters or the error. To make this specification clearer, if we wanted a model of the form

(mercury) = β 0 + β 1 (length) + β 2 (weight) + (error)

our formula would be mercury~length+weight. And with this, you get a sense that the linear model is not just about one input and one response; but more on that later.

In the summary output, we find a table of the Coefficients and their standard errors. Recall that these come from an analytical expression for the sampling dis- tribution that is not dissimilar in spirit to what we discussed for the t-distribution (consult yesterday’s lecture for a list of connections).

The object fit represents your fitted linear model. It contains information about

coefficients(fit)

and

residuals(fit)

and a number of other derived quantities that we’ll cover in our last lecture. You can look at the residuals with

r = residuals(fit) hist(r) qqnorm(r)

You can add the regression line to your scatterplot of mercury content versus weight with the following.

plot(waccamaw$length,waccamaw$mercury) abline(coefficients(fit))

At the risk of beating a dead horse, we can bootstrap the regression coefficients in a fairly straightforward way. Again, the bootstrap gives us a quick assessment of the variability in our estimates, this time β̂ 0 and β̂ 1. Below we bootstrap the slope.

nsim = 5000 bootbeta1 = rep(0,5000)

for(i in 1:nsim) { brows = sample(1:98,replace=T) boots = waccamaw[brows,]

bfit = lm(mercury~length,data=boots) bcoeff = coefficients(bfit)

bootbeta1[i] = bcoeff[2] }

sd(bootbeta1)

Here, we are sampling row numbers (1 to 98) with replacement and then using those to create a new data set boots. We then fit the linear model using the bootstrap data set, pull out the coefficients and keep the second one, the bootstrap replicates

Regression Analysis - Laboratory Review 8 | STATS 13, Lab Reports of Statistics

Related documents

Partial preview of the text

Download Regression Analysis - Laboratory Review 8 | STATS 13 and more Lab Reports Statistics in PDF only on Docsity!

Lab 8: Regression analysis

Due with your final writeup: 1 Question, 1 Bonus

Getting Started