BIVARIATE DATA 7.1 Scatter plots | Slides Statistics

Page 1 of 7

BIVARIATE DATA

7.1 Scatter plots: A scatter plot, scatterplot, or scattergraph is a type

of mathematical diagram using Cartesian coordinates to display values for

two variables for a set of data.

The data is displayed as a collection of points, each having the value of one

variable determining the position on the horizontal axis and the value of the other

variable determining the position on the vertical axis. This kind of plot is also

called a scatter chart, scatter gram, scatter diagram, or scatter graph.

Overview[

A scatter plot is used when a variable exists that is below the control of the

experimenter. If a parameter exists that is systematically incremented and/or

decremented by the other, it is called the control parameter or independent

variable and is customarily plotted along the horizontal axis. The measured

or dependent variable is customarily plotted along the vertical axis. If no dependent

variable exists, either type of variable can be plotted on either axis and a scatter

plot will illustrate only the degree of correlation (not causation) between two

variables.

A scatter plot can suggest various kinds of correlations between variables with a

certain confidence interval. For example, weight and height, weight would be on x

axis and height would be on the y axis. Correlations may be positive (rising),

negative (falling), or null (uncorrelated). If the pattern of dots slopes from lower

left to upper right, it suggests a positive correlation between the variables being

studied. If the pattern of dots slopes from upper left to lower right, it suggests a

negative correlation.

A line of best fit (alternatively called 'trendline') can be drawn in order to study the

correlation between the variables. An equation for the correlation between the

variables can be determined by established best-fit procedures. For a linear

correlation, the best-fit procedure is known as linear regression and is guaranteed

to generate a correct solution in a finite time. No universal best-fit procedure is

guaranteed to generate a correct solution for arbitrary relationships. A scatter plot

is also very useful when we wish to see how two comparable data sets agree with

each other. In this case, an identity line, i.e., a y=x line, or an 1:1 line, is often

drawn as a reference. The more the two data sets agree, the more the scatters tend

to concentrate in the vicinity of the identity line; if the two data sets are

numerically identical, the scatters fall on the identity line exactly.

BIVARIATE DATA 7.1 Scatter plots, Slides of Statistics