Partial preview of the text
Download Exploratory Data Analysis and Statistical Inference: Key Terms and Concepts and more Quizzes Probability and Statistics in PDF only on Docsity!
population
group that we want to studycan usually never measure the
entire population
TERM 2
EDA
DEFINITION 2
Exploratory Data Analysissummarizing the data using graphs
and numbers
TERM 3
sample
DEFINITION 3
subset of the population for which data is actually obtained
TERM 4
inference
DEFINITION 4
drawing conclusions about the population based on the data
collected in the sample
TERM 5
parameter
DEFINITION 5
a number summarizing some feature of the populationeither
an average or a proportioncan usually never be computed,
since the entire pop. can usually never be measured
statistic
the corresponding number summarizing the feature of
interest for the sample
TERM 7
data
DEFINITION 7
pieces of information about individuals organized by
variables
TERM 8
individuals
DEFINITION 8
the people or objects described in the dataset
TERM 9
variables
DEFINITION 9
a characteristic which varies from person to person (or object
to object)a column in a spreadsheet of the dataquantitativeor
categorical
TERM 10
quantitative variable
DEFINITION 10
naturally numerical and usually some measurement or
magnitudeuses numbers directly
biased sampling method
a sampling method that does not produce a sample that is
representative of the population
TERM 17
biased
DEFINITION 17
a sample that has a systematic tendency toward certain
outcomes different from what would be observed in the
population
TERM 18
sampling bias
DEFINITION 18
bias due to the sampling method
TERM 19
random sampling
DEFINITION 19
the only selection method that avoids sampling biasmethods
include: (1) simple random sampling, (2) stratified sampling,
(3) cluster sampling
TERM 20
data
DEFINITION 20
the information we gather withexperimentsand with surveys
statistics
the art and science of designing studies and analyzing the
data that those studies producethe art and science of
learning from data
TERM 22
design
DEFINITION 22
planning how to obtain data to answer the questions of
interest
TERM 23
description
DEFINITION 23
summarizing the data
TERM 24
probability
DEFINITION 24
fundamental for developing statistical inference methods
TERM 25
population
DEFINITION 25
the total set of subjects in which we are interested
random sampling
each subject in the population has the same chance of being
included in that sampledesigned to make the sample
representative of the population
TERM 32
databases
DEFINITION 32
existing archived collections of data files
TERM 33
simulation
DEFINITION 33
using a computer to mimic what would actually happen if you
selected a sample and used statistics in real life
TERM 34
variable
DEFINITION 34
any characteristic that is observed for the subjects in a study
TERM 35
observations
DEFINITION 35
the data values that we observe for a variable
categorical
each observation belongs to one of a set of categories
TERM 37
quantitative
DEFINITION 37
observations on it take numerical values that represent
different magnitudes of the variable
TERM 38
spread
DEFINITION 38
the variability of the data
TERM 39
discrete
DEFINITION 39
for quantitative variablesif its possible values form a set of
separate numbers, such as 0,1,2,3,...
TERM 40
continuous
DEFINITION 40
for quantitative variablesif possible values form an interval
frequency table
a table that lists the possible values of a variable and their
frequencies and/or relative frequenciesa listing of possible
values for a variable, together with the number of
observations for each value
TERM 47
histogram
DEFINITION 47
a graph that uses bars to portray thefrequenciesor the
relative frequencies of the possible outcomes for a
quantitative variable
TERM 48
distribution
DEFINITION 48
the values the variable takes and the frequency ofoccurrence
of each value
TERM 49
unimodal
DEFINITION 49
a distinction of such data
TERM 50
tails
DEFINITION 50
the parts of the curve for the lowest values and for the
highest values
skewed to the left
if the tail is longer than the right tail
TERM 52
skewed to the right
DEFINITION 52
if the right tail is longer than the left tail
TERM 53
skew
DEFINITION 53
to pull in one direction
TERM 54
mean
DEFINITION 54
the sum of the observations divided by the number of
observations
TERM 55
median
DEFINITION 55
the midpoint of the observations when they are ordered fro,
the smallest to the largest (or from the largest to the
smallest)
standard
deviance
the square root of the variance s^2, which is an average of
the squares of the deviations from their meana typical
distance or type of average distance of an observation from
the mean
TERM 62
sum of squares
DEFINITION 62
represents finding the deviation for each observation,
squaring each deviation, and then adding them up
TERM 63
the larger the standard deviation, s, ....
DEFINITION 63
the greater the spread of the data
TERM 64
empirical rule
DEFINITION 64
has this name because many distributions of data observed
in practice (empirically) are approximately bell shaped
TERM 65
____ percent of the observations fall w/in 1
standard deviation of the mean
DEFINITION 65
____ percent of the observations fall w/in 2
standard deviations of the mean
TERM 67
normal distribution
DEFINITION 67
smooth, bell-shaped curves
TERM 68
first quartile Q
DEFINITION 68
the median of the lower half of theobservations
TERM 69
third quartile Q
DEFINITION 69
the median of the upper half of the observations
TERM 70
five-number
summary
DEFINITION 70
the minimum value, Q1, median, Q3, and maximum value
association
exists b/w 2 variables if a particular value for one variable is
more likely to occur w/ certain values of the other variable
TERM 77
contingency table
DEFINITION 77
a display for 2 categorical variablesshows how many subjects
are at each combination of categories of 2 categorical
variables
TERM 78
cross-tabulation
DEFINITION 78
the process of taking a data file and finding thefrequencies
for the cell of the contingency table
TERM 79
marginal proportion
DEFINITION 79
is found using counts in the margin of the tablethe proportion
of all sampled produce items that contained pesticide
residues [is not a conditional proportion]
TERM 80
correlation
DEFINITION 80
summarizes the direction of the association b/w 2
quantitative variables and the strength of its straight-line
trendrbetween -1 and +
positive r value
positive association
TERM 82
negative r value
DEFINITION 82
negative association
TERM 83
stemplot
DEFINITION 83
a useful visual display of the distribution of a quantitative
variable, which is :easy and quick to construct for small,
simple datasetsretains the actual datasorts the data
TERM 84
shape
DEFINITION 84
when describing the shape, we should consider (1)
symmetry/skewness and (2) peakedness (modality) --number
of peaks (modes)