Scatterplots and Correlation, Lecture notes of Statistics

A scatterplot shows the relationship between two quantitative variables measured for the same individuals. The values of one variable appear on the horizontal ...

Typology: Lecture notes

2021/2022

Uploaded on 08/01/2022

hal_s95
hal_s95 🇵🇭

4.4

(655)

10K documents

1 / 14

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Scatterplots and Correlation
Diana Mindrila, Ph.D.
Phoebe Balentyne, M.Ed.
Based on Chapter 4 of The Basic Practice of Statistics (6th ed.)
Concepts:
Displaying Relationships: Scatterplots
Interpreting Scatterplots
Adding Categorical Variables to Scatterplots
Measuring Linear Association: Correlation
Facts About Correlation
Objectives:
Construct and interpret scatterplots.
Add categorical variables to scatterplots.
Calculate and interpret correlation.
Describe facts about correlation.
References:
Moore, D. S., Notz, W. I, & Flinger, M. A. (2013). The basic practice of statistics (6th
ed.). New York, NY: W. H. Freeman and Company.
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe

Partial preview of the text

Download Scatterplots and Correlation and more Lecture notes Statistics in PDF only on Docsity!

Scatterplots and Correlation

Diana Mindrila, Ph.D. Phoebe Balentyne, M.Ed.

Based on Chapter 4 of The Basic Practice of Statistics (6th^ ed.)

Concepts:  Displaying Relationships: Scatterplots  Interpreting Scatterplots  Adding Categorical Variables to Scatterplots  Measuring Linear Association: Correlation  Facts About Correlation

Objectives:  Construct and interpret scatterplots.  Add categorical variables to scatterplots.  Calculate and interpret correlation.  Describe facts about correlation.

References: Moore, D. S., Notz, W. I, & Flinger, M. A. (2013). The basic practice of statistics (6th ed.). New York, NY: W. H. Freeman and Company.

Scatterplot  The most useful graph for displaying the relationship between two quantitative variables is a scatterplot.

 Many research projects are correlational studies because they investigate the relationships that may exist between variables. Prior to investigating the relationship between two quantitative variables, it is always helpful to create a graphical representation that includes both of these variables. Such a graphical representation is called a scatterplot.

A scatterplot shows the relationship between two quantitative

variables measured for the same individuals. The values of one

variable appear on the horizontal axis, and the values of the other

variable appear on the vertical axis. Each individual in the data

appears as a point on the graph.

 The purpose of a scatterplot is to provide a general illustration of the relationship between the two variables.  In this example, in general, as GPA increases so does an individual’s motivation score.  One of the students in this example does not seem to follow the general pattern: Mary. She is one of the students with the lowest GPA, but she has the maximum score on the motivation scale. This makes her an exception or an outlier.

Interpreting Scatterplots

How to Examine a Scatterplot

As in any graph of data, look for the overall pattern and for striking

departures from that pattern.

  • The overall pattern of a scatterplot can be described by the

direction , form , and strength of the relationship.

  • An important kind of departure is an outlier , an individual

value that falls outside the overall pattern of the relationship.

Interpreting Scatterplots: Form  Another important component to a scatterplot is the form of the relationship between the two variables.

This example illustrates a linear relationship. This means that the points on the scatterplot closely resemble a straight line. A relationship is linear if one variable increases by approximately the same rate as the other variables changes by one unit.

This example illustrates a relationship that has the form of a curve, rather than a straight line. This is due to the fact that one variable does not increase at a constant rate and may even start decreasing after a certain point. This example describes a curvilinear relationship between the variable “age” and the variable “working memory.” In this example, working memory increases throughout childhood, remains steady in adulthood, and begins decreasing around age 50.

Interpreting Scatterplots: Strength  Another important component to a scatterplot is the strength of the relationship between the two variables.  The slope provides information on the strength of the relationship.

 The strongest linear relationship occurs when the slope is 1. This means that when one variable increases by one, the other variable also increases by the same amount. This line is at a 45 degree angle.  The strength of the relationship between two variables is a crucial piece of information. Relying on the interpretation of a scatterplot is too subjective. More precise evidence is needed, and this evidence is obtained by computing a coefficient that measures the strength of the relationship under investigation.

Correlations

Example: There is a moderate, positive, linear relationship between GPA and achievement motivation.

r = 0.

 Based on the criteria listed on the previous page, the value of r in this case (r = 0.62) indicates that there is a positive, linear relationship of moderate strength between achievement motivation and GPA.

Correlation  The images below illustrate what the relationships might look like at different degrees of strength (for different values of r).

 For a correlation coefficient of zero, the points have no direction, the shape is almost round, and a line does not fit to the points on the graph.  As the correlation coefficient increases, the observations group closer together in a linear shape.  The line is difficult to detect when the relationship is weak (e.g., r = -0.3), but becomes more clear as relationships become stronger (e.g., r = -0.99)

Facts About Correlation

  1. The order of variables in a correlation is not important.

  2. Correlations provide evidence of association, not causation.

  3. r has no units and does not change when the units of measure of x , y , or both

are changed.

  1. Positive r values indicate positive association between the variables, and

negative r values indicate negative associations.

  1. The correlation r is always a number between -1 and 1.

Pearson r : Assumptions Assumptions:  Correlation requires that both variables be quantitative.  Correlation describes linear relationships. Correlation does not describe curve relationships between variables, no matter how strong the relationship is.

Cautions:  Correlation is not resistant. r is strongly affected by outliers.  Correlation is not a complete summary of two-variable data.  For example:

 The correlation coefficient is based on means and standard deviations, so it is not robust to outliers; it is strongly affected by extreme observations. These individuals are sometimes referred to as influential observations because they have a strong impact on the correlation coefficient.  For instance, in the above example the correlation coefficient is 0.62 on the left when the outlier is included in the analysis. However, when this outlier is removed, the correlation coefficient increases significantly to 0.89.  This one case, when included in the analysis, reduces a strong relationship to a moderate relationship.  This case makes such a big difference in this example because the data set contains a very small number of individuals. As a general rule, as the size of the sample increases, the influence of extreme observations decreases.  When describing the relationship between two variables, correlations are just one piece of the puzzle. This information is necessary, but not sufficient. Other analyses should also be conducted to provide more information.