Exploratory Data Analysis: Two Variables, Slides of Advanced Data Analysis

Side-by-side box plots. 2 quantitative variables. Scatter plots, correlations, regressions. Box plots. A box plot is a graph of ...

Typology: Slides

2022/2023

Uploaded on 02/28/2023

ubimaiorminorcessat
ubimaiorminorcessat 🇺🇸

4.4

(17)

225 documents

1 / 5

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
9/10/09
1
FPP 7-9
Exploratory Data Analysis: Two
Variables
Exploratory data analysis: two variables
2 qualitative/categorical variables
Contingency tables (we will cover these later in the semester)
1 qualitative/categorical, 1 quantitative variables
Side-by-side box plots
2 quantitative variables
Scatter plots, correlations, regressions
Box plots
A box plot is a graph of five numbers
minimum,
Maximum
Median
1st quartile
3rd quartile
We know how to compute three of the numbers
(min,max,median)
To compute the 1st quartile find the median of the 50% of
observations that are smaller than the median
To compute the 3rd quartile find the median of the 50% of observatins
that are bigger than the median
Side-by-side box plots
Box plots are very useful when comparing distributions of a
quantitative variable for levels of some qualitative variable
Pets and stress
Are there any differences in stress levels when doing tasks
with your pet, a good friend, or alone?
Allen et al. (1988) asked 45 people to count backwards by
13s and 17s.
People were randomly assigned to one of the three groups:
pet, friend, alone.
Response is subject’s average heart rate during task
Pets and stress
It looks like the task is
most stressful around
friends and least
stressful around pets
pf3
pf4
pf5

Partial preview of the text

Download Exploratory Data Analysis: Two Variables and more Slides Advanced Data Analysis in PDF only on Docsity!

FPP 7-

Exploratory Data Analysis: Two

Variables

Exploratory data analysis: two variables

 2 qualitative/categorical variables

 Contingency tables (we will cover these later in the semester)

 1 qualitative/categorical, 1 quantitative variables

 Side-by-side box plots

 2 quantitative variables

 Scatter plots, correlations, regressions

Box plots

 A box plot is a graph of five numbers  minimum,  Maximum  Median  1 st^ quartile  3 rd^ quartile  We know how to compute three of the numbers (min,max,median)  To compute the 1st^ quartile find the median of the 50% of observations that are smaller than the median  To compute the 3rd^ quartile find the median of the 50% of observatins that are bigger than the median

Side-by-side box plots

 Box plots are very useful when comparing distributions of a

quantitative variable for levels of some qualitative variable

Pets and stress

 Are there any differences in stress levels when doing tasks

with your pet, a good friend, or alone?

 Allen et al. (1988) asked 45 people to count backwards by

13s and 17s.

 People were randomly assigned to one of the three groups:

pet, friend, alone.

 Response is subject’s average heart rate during task

Pets and stress

 It looks like the task is

most stressful around

friends and least

stressful around pets

Vietnam draft lottery

 In 1970, the US government drafted young men for military service in the These men were drafted by means of a random lottery. Basically, paper slips containing Vietnam War. all dates in January were placed in a wooden box and then mixed. Next, all dates in February (including 2/29) were added to the box and mixed. This procedure was repeated until all 366 dates were mixed in the box. Finally, dates were successively drawn without replacement. The first data drawn (Sept. 14) was assigned rank 1, the second data drawn (April 24) was assigned rank 2, and so on. Those eligible for the draft who were born on Sept. 14 were called first to service, then those born on April 24 were called, and so on.  Soon after the lottery, people began to complain that the randomization system was not completely fair. They believed that birth dates later in the year had lower lottery numbers than those earlier in the year (Fienberg, 1971)  What do the data say? Was the draft lottery fair? Let’s to a statistical analysis of the data to find out.

Draft rank by month in the Vietnam draft

lottery: Raw data

Draft rank by month in the Vietnam draft

lottery: Box plots

Exploratory data analysis two quantitative

variables

 Scatter plots

 A scatter plot shows one variable vs. the other in a 2- dimensional graph  Always plot the explanatory variable, if there is one, on the horizontal axis  We usually call the explanatory variable x and the response variable y  If there is no explanatory-response distinction, either variable can go on the horizontal axis

Example Gross 890.5 Sales Items 115

(^197231 ) 202.5^170
225.5 489.7 (^3584) 234.8 161.5 (^4221) (^284422 ) 300.7 412.4 (^5969) 346.8 92.3 5919 255.8 118.5 (^4216) 286.5 594 3972 263.29 244.08 (^4345) 394.28 241.31 (^6436) 299.97 649.04 10340

Describing scatter plots

 Form  Linear, quadratic, exponential  Direction  Positive association  An increase in one variable is accompanied by an increase in the other  Negatively associated  A decrease in one variable is accompanied by an increase in the other  Strength  How closely the points follow a clear form

True or False

 Let X be GNP for the U.S. in dollars and Y be GNP for Mexico, in pesos. Changing Y to U.S. dollars changes the value of the correlation. Correlation Coefficient is ____ 5 5 (^00) Correlation Coefficient is ____ 5 5 (^00) Correlation Coefficient is _____ 5 5 (^00) Correlation Coefficient is ____ 5 5 (^00)

Correlation coefficient

 Correlation is not an appropriate measure of association for

non-linear relationships

 What would r be for this scatter plot

Correlation coefficient

Correlation coefficient

 CORRELATION IS NOT CAUSATION

 A substantial correlation between two variables might

indicate the influence of other variables on both

 Or, lack of substantial correlation might mask the effect of

the other variables

Correlation coefficient

 CORRELATION IS NOT CAUSATION

 Plot of life expectancy of population and number of people per TV for 22 countries (1991 data)

Correlation coefficient

 CORRELATION IS NOT CAUSATION

 A study showed that there was a strong correlation between

the number of firefighters at a fire and the property damage

that the fire causes.

 We should send less fire fighters to fight fires right??  Example of a lurking variable what might it be?

Interpreting correlations

 A newspaper article contains a quote from a

psychologist, who says, “The evidence indicates the

correlation between the research productivity and

teaching rating of faculty members is close to zero.” The

paper reports this as “The professor said that good

researchers tend to be poor teachers, and vice versa.”

Did the newspaper get it right?

Correlation coefficient

 What’s wrong with each of these statements?

 There exists a high correlation between the gender of American workers and their income.  The correlation between amount of sunlight and plant growth was r = 0.35 centimeters.  There is a correlation of r =1.78 between speed of reading and years of practice

Examining many correlations

simultaneously

 The correlation matrix displays correlations for all pairs of

variables