

Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Material Type: Notes; Professor: Smith; Class: INTRO TO STATISTICS; Subject: Statistics; University: University of California-Riverside; Term: Spring 2010;
Typology: Study notes
1 / 3
This page cannot be seen from the preview
Don't miss anything!


55
STAT 100A Chapter 3: Describing Bivariate Data Sections 3.1-3.2: Bivariate Data & Graphs for Qualitative Variables
Sometimes the data that are collected consist of observations for two variables on the same experimental unit. Special techniques that can be used in describing these variables will help you identify possible relationships between them.
56
A side-by-side bar graph is used to compare two or more sets of qualitative data.
Note: Data sets should always be compared by using relative frequencies, because different sample or population sizes make comparisons using frequencies difficult.
Example: In a class survey, Penn State statistics students were asked, “Regarding your weight, do you think you are: About right? Overweight? Underweight?” The following graph displays the results by sex.
female male
about right overweight underweight
perception
relative frequency
Gender and Perception of Weight
Side-by-side pie charts are used to compare two or more sets of qualitative data.
Example: The color distributions for two snacksize bags of M&M’s candies, one plain and one peanut, are displayed first in a contingency table and then in side-by-side pie charts.
Brown Yellow Red Orange Green Blue Plain 15 14 12 4 5 6 Peanut 6 2 2 3 3 5
Peanut Plain Brown Yellow RedOrange GreenBlue
C ategory
23.8%Blue
14.3%Green
Orange14.3% Red9.5%
Yellow9.5%
Brown28.6% 10.7%Blue Green8.9%
Orange7.1%
21.4%Red Yellow 25.0%
Brown26.8%
Panel variable: Type
Pie Chart of Color by Type of M&M's
STAT 100A Section 3.3: Scatterplots for Two Quantitative Variables
To determine whether a linear relationship between y and x is plausible, it is helpful to plot the sample data in a scatterplot.
scatterplot – shows the relationship between two quantitative variables measured on the same experimental units with one variable’s values plotted along the vertical axis and the other along the horizontal axis. Each experimental unit in the data appears as the point in the plot fixed by the values of both variables for that experimental unit.
59
Examining a Scatterplot
In any graph of data, look for the overall pattern and for striking deviations from that pattern.
You can describe the overall pattern of a scatterplot by the form , direction , and strength of the relationship.
An important kind of deviation is an outlier, an individual value that falls outside the overall pattern of the relationship.
positively correlated – an increase in one variable is generally associated with an increase in the second variable.
negatively correlated – one variable has a tendency to decrease as the other increases.
60
STAT 100A Section 3.4: Numerical Measures for Quantitative Bivariate Data
coefficient of correlation, r - a measure of the strength of the linear relationship between two variables x and y.
xy
xx yy
Characteristics of r:
r = 0 implies no linear relationship
r > 0 implies a positive linear relationship
Values of r and their Implications
Note// High correlation does not imply causality. When a high correlation exists in the sample, the only safe conclusion is that a linear trend may exist between x and y. Another (lurking) variable may be the underlying cause of the high correlation between x & y.
x = # of casino employees y = crime rate r = 0. LV = # of tourists