




Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Material Type: Notes; Professor: Qu; Class: Statistical Analysis; Subject: Statistics; University: University of Illinois - Urbana-Champaign; Term: Unknown 1989;
Typology: Study notes
1 / 8
This page cannot be seen from the preview
Don't miss anything!





STAT 200, S1 Lecture 4
Chapter 3
Explanatory Variables: Also called independent variable, it explains or influence changes in a response variables
Response Variables: Also called dependent variables, it measures the outcomes of the study, it depends on explanatory variables.
Example: Alcohol consumption and percent of alcohol in blood, the legal limit for driving is 0.08%, some students volunteer to drink different number of cans of beers, 30 minutes later, they are measured blood alcohol contents.
What is response variable?
What is explanatory variable?
Note: Most of studies involve several explanatory variables to explain the response variable.
Example : Age and gender help predict the future height, but they do not cause a particular height, it involves many other factors as well.
Displaying relationship: scatterplots
Example 3.3 – Scatterplots
Read in the data, verify the variable names, and attach each column to its own variable object.
> data3.3 = read.xls("D:\DataSets\Excel\ch03\ta03_01.xls" ) > names(data3.3) [1] "Year" "Powerboats" "Deaths" > attach(data3.3)
The R command for generating a scatterplot is plot(x,y), where the object taking the place of x will be the variable along the x-axis and the
object taking the place of y will be the variable along the y-axis. One help file for this command can be found with ?plot, but a more detailed list of useful options can be found by typing ?plot.default.
> plot(Powerboats, Deaths)
500 600 700 800 900 1000
20
40
60
80
Powerboats
Deaths
Example 3.5 – Scatterplots
Read in the data, verify the variable names, and attach each column to its own variable object.
> data3.5 = read.xls("D:\DataSets\Excel\ch03\ta03_02.xls" ) > names(data3.5) [1] "species" "mass" "abund" > attach(data3.5)
Begin by looking at the two quantitative variables on a scatterplot.
> plot(mass, abund)
0 50 100 150 200 250 300
0
500
1000
1500
mass
abund
The pattern of the data suggests that a log-transform will make a more visually appealing plot. Use the log="xy" option to change both the x and y axes to a logarithmic scale.
> plot(mass, abund, log="xy")
0.2 0.5 1.0 2.0 5.0 10.0 20.0 50.0 200.
5e-
5e+
5e+
5e+
mass
abund
Another option is to apply the log command to each variable, thus taking the logarithm of each value for each variable.
> plot(log(mass), log(abund))
-2 0 2 4 6
0
2
4
6
log(mass)
log(abund)
Correlation: measure linear association
Just by examining plots, your eyes might be fooled by the scatterplots if the plot scale is changed
Correlation: measures the direction and strength of the linear relation between two quantitative variables.
1 ( )( ) 1
i i x y
x x y y r n s s
Example 3.7 – Correlation
The R command for finding the correlation coefficient is cor. All that is required is the names of the two variables of interest. The default method is Pearson’s correlation coefficient, which is the calculation demonstrated in the textbook, but others are available as options, including Spearman’s nonparametric version.
> # Example 3. > data3.3 = read.xls("D:\DataSets\Excel\ch03\ta03_01.xls" ) > attach(data3.3)
> cor(Powerboats, Deaths) [1] 0.
As noted in the textbook, correlation makes no distinction between explanatory and response variables, and so neither does R. The order of the variables in the cor command does not matter.
> cor(Deaths, Powerboats) [1] 0.
Recall in Example 3.5, the logarithmic relationship of the variables was examined.
> # Example 3. > data3.5 = read.xls("D:\DataSets\Excel\ch03\ta03_02.xls" ) > attach(data3.5)
> cor(log(mass),log(abund)) [1] -0.