Exploratory Data Analysis and Statistical Inference: Key Terms and Concepts, Quizzes of Probability and Statistics

Definitions for essential terms and concepts in exploratory data analysis (eda) and statistical inference, including population, sample, parameter, statistic, data, variables, quantitative and categorical variables, eda for one and two variables, distribution, unbiased and biased sampling, random sampling, descriptive and inferential statistics, probability, and various statistical measures. It serves as a useful resource for students and researchers in statistics, data science, and related fields.

Typology: Quizzes

2011/2012

Uploaded on 02/11/2012

melalala123
melalala123 🇺🇸

20 documents

1 / 17

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
TERM 1
population
DEFINITION 1
group that we want to studycan usually never measure the
entire population
TERM 2
EDA
DEFINITION 2
Exploratory Data Analysissummarizing the data using graphs
and numbers
TERM 3
sample
DEFINITION 3
subset of the population for which data is actually obtained
TERM 4
inference
DEFINITION 4
drawing conclusions about the population based on the data
collected in the sample
TERM 5
parameter
DEFINITION 5
a number summarizing some feature of the populationeither
an average or a proportioncan usually never be computed,
since the entire pop. can usually never be measured
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff

Partial preview of the text

Download Exploratory Data Analysis and Statistical Inference: Key Terms and Concepts and more Quizzes Probability and Statistics in PDF only on Docsity!

population

group that we want to studycan usually never measure the

entire population

TERM 2

EDA

DEFINITION 2

Exploratory Data Analysissummarizing the data using graphs

and numbers

TERM 3

sample

DEFINITION 3

subset of the population for which data is actually obtained

TERM 4

inference

DEFINITION 4

drawing conclusions about the population based on the data

collected in the sample

TERM 5

parameter

DEFINITION 5

a number summarizing some feature of the populationeither

an average or a proportioncan usually never be computed,

since the entire pop. can usually never be measured

statistic

the corresponding number summarizing the feature of

interest for the sample

TERM 7

data

DEFINITION 7

pieces of information about individuals organized by

variables

TERM 8

individuals

DEFINITION 8

the people or objects described in the dataset

TERM 9

variables

DEFINITION 9

a characteristic which varies from person to person (or object

to object)a column in a spreadsheet of the dataquantitativeor

categorical

TERM 10

quantitative variable

DEFINITION 10

naturally numerical and usually some measurement or

magnitudeuses numbers directly

biased sampling method

a sampling method that does not produce a sample that is

representative of the population

TERM 17

biased

DEFINITION 17

a sample that has a systematic tendency toward certain

outcomes different from what would be observed in the

population

TERM 18

sampling bias

DEFINITION 18

bias due to the sampling method

TERM 19

random sampling

DEFINITION 19

the only selection method that avoids sampling biasmethods

include: (1) simple random sampling, (2) stratified sampling,

(3) cluster sampling

TERM 20

data

DEFINITION 20

the information we gather withexperimentsand with surveys

statistics

the art and science of designing studies and analyzing the

data that those studies producethe art and science of

learning from data

TERM 22

design

DEFINITION 22

planning how to obtain data to answer the questions of

interest

TERM 23

description

DEFINITION 23

summarizing the data

TERM 24

probability

DEFINITION 24

fundamental for developing statistical inference methods

TERM 25

population

DEFINITION 25

the total set of subjects in which we are interested

random sampling

each subject in the population has the same chance of being

included in that sampledesigned to make the sample

representative of the population

TERM 32

databases

DEFINITION 32

existing archived collections of data files

TERM 33

simulation

DEFINITION 33

using a computer to mimic what would actually happen if you

selected a sample and used statistics in real life

TERM 34

variable

DEFINITION 34

any characteristic that is observed for the subjects in a study

TERM 35

observations

DEFINITION 35

the data values that we observe for a variable

categorical

each observation belongs to one of a set of categories

TERM 37

quantitative

DEFINITION 37

observations on it take numerical values that represent

different magnitudes of the variable

TERM 38

spread

DEFINITION 38

the variability of the data

TERM 39

discrete

DEFINITION 39

for quantitative variablesif its possible values form a set of

separate numbers, such as 0,1,2,3,...

TERM 40

continuous

DEFINITION 40

for quantitative variablesif possible values form an interval

frequency table

a table that lists the possible values of a variable and their

frequencies and/or relative frequenciesa listing of possible

values for a variable, together with the number of

observations for each value

TERM 47

histogram

DEFINITION 47

a graph that uses bars to portray thefrequenciesor the

relative frequencies of the possible outcomes for a

quantitative variable

TERM 48

distribution

DEFINITION 48

the values the variable takes and the frequency ofoccurrence

of each value

TERM 49

unimodal

DEFINITION 49

a distinction of such data

TERM 50

tails

DEFINITION 50

the parts of the curve for the lowest values and for the

highest values

skewed to the left

if the tail is longer than the right tail

TERM 52

skewed to the right

DEFINITION 52

if the right tail is longer than the left tail

TERM 53

skew

DEFINITION 53

to pull in one direction

TERM 54

mean

DEFINITION 54

the sum of the observations divided by the number of

observations

TERM 55

median

DEFINITION 55

the midpoint of the observations when they are ordered fro,

the smallest to the largest (or from the largest to the

smallest)

standard

deviance

the square root of the variance s^2, which is an average of

the squares of the deviations from their meana typical

distance or type of average distance of an observation from

the mean

TERM 62

sum of squares

DEFINITION 62

represents finding the deviation for each observation,

squaring each deviation, and then adding them up

TERM 63

the larger the standard deviation, s, ....

DEFINITION 63

the greater the spread of the data

TERM 64

empirical rule

DEFINITION 64

has this name because many distributions of data observed

in practice (empirically) are approximately bell shaped

TERM 65

____ percent of the observations fall w/in 1

standard deviation of the mean

DEFINITION 65

____ percent of the observations fall w/in 2

standard deviations of the mean

TERM 67

normal distribution

DEFINITION 67

smooth, bell-shaped curves

TERM 68

first quartile Q

DEFINITION 68

the median of the lower half of theobservations

TERM 69

third quartile Q

DEFINITION 69

the median of the upper half of the observations

TERM 70

five-number

summary

DEFINITION 70

the minimum value, Q1, median, Q3, and maximum value

association

exists b/w 2 variables if a particular value for one variable is

more likely to occur w/ certain values of the other variable

TERM 77

contingency table

DEFINITION 77

a display for 2 categorical variablesshows how many subjects

are at each combination of categories of 2 categorical

variables

TERM 78

cross-tabulation

DEFINITION 78

the process of taking a data file and finding thefrequencies

for the cell of the contingency table

TERM 79

marginal proportion

DEFINITION 79

is found using counts in the margin of the tablethe proportion

of all sampled produce items that contained pesticide

residues [is not a conditional proportion]

TERM 80

correlation

DEFINITION 80

summarizes the direction of the association b/w 2

quantitative variables and the strength of its straight-line

trendrbetween -1 and +

positive r value

positive association

TERM 82

negative r value

DEFINITION 82

negative association

TERM 83

stemplot

DEFINITION 83

a useful visual display of the distribution of a quantitative

variable, which is :easy and quick to construct for small,

simple datasetsretains the actual datasorts the data

TERM 84

shape

DEFINITION 84

when describing the shape, we should consider (1)

symmetry/skewness and (2) peakedness (modality) --number

of peaks (modes)