Bivariate Data Sets - Introduction to Statistic | STAT 100A, Study notes of Statistics

Material Type: Notes; Professor: Smith; Class: INTRO TO STATISTICS; Subject: Statistics; University: University of California-Riverside; Term: Spring 2010;

Typology: Study notes

Pre 2010

Uploaded on 05/11/2010

ison001
ison001 🇺🇸

10 documents

1 / 3

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
55
STAT 100A
Chapter 3: Describing Bivariate Data
Sections 3.1-3.2: Bivariate Data & Graphs for Qualitative Variables
Sometimes the data that are collected consist of
observations for two variables on the same
experimental unit. Special techniques that can be
used in describing these variables will help you
identify possible relationships between them.
Bivariate Data Sets
qualitative, qualitative
qualitative, quantitative
quantitative, quantitative
56
Side-by-Side Bar Graph
A side-by-side bar graph is used to compare two or
more sets of qualitative data.
Note: Data sets should always be compared by using relative
frequencies, because different sample or population sizes make
comparisons using frequencies difficult.
Example: In a class survey, Penn State statistics students
were asked, “Regarding your weight, do you think you are:
About right? Overweight? Underweight?” The following
graph displays the results by sex.
female
male
about right overweight underweight
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
perception
relative frequency
Gender and Perception of Weigh t
57
Side-by-Side Pie Charts
Side-by-side pie charts are used to compare two or
more sets of qualitative data.
Example: The color distributions for two snacksize
bags of M&M’s candies, one plain and one peanut,
are displayed first in a contingency table and then in
side-by-side pie charts.
Brown Yellow Red Orange Green Blue
Plain 15 14 12 4 5 6
Peanut 6 2 2 3 3 5
Peanut Plain Brown
Yellow
Red
Orange
Green
Blue
Category
Blue
23.8%
Green
14.3%
Orange
14.3%
Red
9.5%
Yellow
9.5%
Brown
28.6%
Blue
10.7%
Green
8.9%
Orange
7.1%
Red
21.4% Yellow
25.0%
Brown
26.8%
Panel variable: Type
Pie Chart of Color by Ty pe of M&M's
58
STAT 100A
Section 3.3: Scatterplots for Two Quantitative Variables
To determine whether a linear relationship between y
and x is plausible, it is helpful to plot the sample data
in a scatterplot.
scatterplot – shows the relationship between two
quantitative variables measured on the same
experimental units with one variable’s values plotted
along the vertical axis and the other along the
horizontal axis. Each experimental unit in the data
appears as the point in the plot fixed by the values of
both variables for that experimental unit.
pf3

Partial preview of the text

Download Bivariate Data Sets - Introduction to Statistic | STAT 100A and more Study notes Statistics in PDF only on Docsity!

55

STAT 100A Chapter 3: Describing Bivariate Data Sections 3.1-3.2: Bivariate Data & Graphs for Qualitative Variables

Sometimes the data that are collected consist of observations for two variables on the same experimental unit. Special techniques that can be used in describing these variables will help you identify possible relationships between them.

Bivariate Data Sets

 qualitative, qualitative

 qualitative, quantitative

 quantitative, quantitative

56

Side-by-Side Bar Graph

A side-by-side bar graph is used to compare two or more sets of qualitative data.

Note: Data sets should always be compared by using relative frequencies, because different sample or population sizes make comparisons using frequencies difficult.

Example: In a class survey, Penn State statistics students were asked, “Regarding your weight, do you think you are: About right? Overweight? Underweight?” The following graph displays the results by sex.

female male

about right overweight underweight

perception

relative frequency

Gender and Perception of Weight

Side-by-Side Pie Charts

Side-by-side pie charts are used to compare two or more sets of qualitative data.

Example: The color distributions for two snacksize bags of M&M’s candies, one plain and one peanut, are displayed first in a contingency table and then in side-by-side pie charts.

Brown Yellow Red Orange Green Blue Plain 15 14 12 4 5 6 Peanut 6 2 2 3 3 5

Peanut Plain Brown Yellow RedOrange GreenBlue

C ategory

23.8%Blue

14.3%Green

Orange14.3% Red9.5%

Yellow9.5%

Brown28.6% 10.7%Blue Green8.9%

Orange7.1%

21.4%Red Yellow 25.0%

Brown26.8%

Panel variable: Type

Pie Chart of Color by Type of M&M's

STAT 100A Section 3.3: Scatterplots for Two Quantitative Variables

To determine whether a linear relationship between y and x is plausible, it is helpful to plot the sample data in a scatterplot.

scatterplot – shows the relationship between two quantitative variables measured on the same experimental units with one variable’s values plotted along the vertical axis and the other along the horizontal axis. Each experimental unit in the data appears as the point in the plot fixed by the values of both variables for that experimental unit.

59

Examining a Scatterplot

In any graph of data, look for the overall pattern and for striking deviations from that pattern.

You can describe the overall pattern of a scatterplot by the form , direction , and strength of the relationship.

An important kind of deviation is an outlier, an individual value that falls outside the overall pattern of the relationship.

positively correlated – an increase in one variable is generally associated with an increase in the second variable.

negatively correlated – one variable has a tendency to decrease as the other increases.

60

STAT 100A Section 3.4: Numerical Measures for Quantitative Bivariate Data

coefficient of correlation, r - a measure of the strength of the linear relationship between two variables x and y.

xy

xx yy

S

r

S S

Characteristics of r:

  1. -1 ≤ r ≤ 1
  2. r < 0 implies a negative linear relationship

r = 0 implies no linear relationship

r > 0 implies a positive linear relationship

  1. The closer r is to 1 or -1, the stronger the linear relationship.

Values of r and their Implications

Note// High correlation does not imply causality. When a high correlation exists in the sample, the only safe conclusion is that a linear trend may exist between x and y. Another (lurking) variable may be the underlying cause of the high correlation between x & y.

x = # of casino employees y = crime rate r = 0. LV = # of tourists