Elementary Statistics - Lecture Notes | STAT 200, Study notes of Statistics

Material Type: Notes; Professor: Qu; Class: Statistical Analysis; Subject: Statistics; University: University of Illinois - Urbana-Champaign; Term: Unknown 1989;

Typology: Study notes

Pre 2010

Uploaded on 03/10/2009

koofers-user-t4n
koofers-user-t4n 🇺🇸

10 documents

1 / 8

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
STAT 200, S1 Lecture 4
Chapter 3
Explanatory Variables: Also called independent variable, it explains or
influence changes in a response variables
Response Variables: Also called dependent variables, it measures the
outcomes of the study, it depends on explanatory variables.
Example: Alcohol consumption and percent of alcohol in blood, the
legal limit for driving is 0.08%, some students volunteer to drink different
number of cans of beers, 30 minutes later, they are measured blood alcohol
contents.
What is response variable?
What is explanatory variable?
Note: Most of studies involve several explanatory variables to explain the
response variable.
Example: Age and gender help predict the future height, but they do not
cause a particular height, it involves many other factors as well.
pf3
pf4
pf5
pf8

Partial preview of the text

Download Elementary Statistics - Lecture Notes | STAT 200 and more Study notes Statistics in PDF only on Docsity!

STAT 200, S1 Lecture 4

Chapter 3

Explanatory Variables: Also called independent variable, it explains or influence changes in a response variables

Response Variables: Also called dependent variables, it measures the outcomes of the study, it depends on explanatory variables.

Example: Alcohol consumption and percent of alcohol in blood, the legal limit for driving is 0.08%, some students volunteer to drink different number of cans of beers, 30 minutes later, they are measured blood alcohol contents.

What is response variable?

What is explanatory variable?

Note: Most of studies involve several explanatory variables to explain the response variable.

Example : Age and gender help predict the future height, but they do not cause a particular height, it involves many other factors as well.

Displaying relationship: scatterplots

Example 3.3 – Scatterplots

Read in the data, verify the variable names, and attach each column to its own variable object.

> data3.3 = read.xls("D:\DataSets\Excel\ch03\ta03_01.xls" ) > names(data3.3) [1] "Year" "Powerboats" "Deaths" > attach(data3.3)

The R command for generating a scatterplot is plot(x,y), where the object taking the place of x will be the variable along the x-axis and the

object taking the place of y will be the variable along the y-axis. One help file for this command can be found with ?plot, but a more detailed list of useful options can be found by typing ?plot.default.

> plot(Powerboats, Deaths)

500 600 700 800 900 1000

20

40

60

80

Powerboats

Deaths

Example 3.5 – Scatterplots

Read in the data, verify the variable names, and attach each column to its own variable object.

> data3.5 = read.xls("D:\DataSets\Excel\ch03\ta03_02.xls" ) > names(data3.5) [1] "species" "mass" "abund" > attach(data3.5)

Begin by looking at the two quantitative variables on a scatterplot.

> plot(mass, abund)

0 50 100 150 200 250 300

0

500

1000

1500

mass

abund

The pattern of the data suggests that a log-transform will make a more visually appealing plot. Use the log="xy" option to change both the x and y axes to a logarithmic scale.

> plot(mass, abund, log="xy")

0.2 0.5 1.0 2.0 5.0 10.0 20.0 50.0 200.

5e-

5e+

5e+

5e+

mass

abund

Another option is to apply the log command to each variable, thus taking the logarithm of each value for each variable.

> plot(log(mass), log(abund))

-2 0 2 4 6

0

2

4

6

log(mass)

log(abund)

Correlation: measure linear association

Just by examining plots, your eyes might be fooled by the scatterplots if the plot scale is changed

Correlation: measures the direction and strength of the linear relation between two quantitative variables.

1 ( )( ) 1

i i x y

x x y y r n s s

− −

Example 3.7 – Correlation

The R command for finding the correlation coefficient is cor. All that is required is the names of the two variables of interest. The default method is Pearson’s correlation coefficient, which is the calculation demonstrated in the textbook, but others are available as options, including Spearman’s nonparametric version.

> # Example 3. > data3.3 = read.xls("D:\DataSets\Excel\ch03\ta03_01.xls" ) > attach(data3.3)

> cor(Powerboats, Deaths) [1] 0.

As noted in the textbook, correlation makes no distinction between explanatory and response variables, and so neither does R. The order of the variables in the cor command does not matter.

> cor(Deaths, Powerboats) [1] 0.

Recall in Example 3.5, the logarithmic relationship of the variables was examined.

> # Example 3. > data3.5 = read.xls("D:\DataSets\Excel\ch03\ta03_02.xls" ) > attach(data3.5)

> cor(log(mass),log(abund)) [1] -0.