






Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Data the facts & figures collected, analyzed, and summarized for presentation and interpretation Dataset all the data collected for a particular analysis Element the entity on which data is collected Variable a characteristic of interest of an element Observation the variables associated with an individual element Categorical data use numeric or ordinal values of measurement of categories
Typology: Exams
1 / 10
This page cannot be seen from the preview
Don't miss anything!







Data
the facts & figures collected, analyzed, and summarized for presentation and interpretation
Dataset
all the data collected for a particular analysis
Element
the entity on which data is collected
Variable
a characteristic of interest of an element
Observation
the variables associated with an individual element
Categorical data
use numeric or ordinal values of measurement of categories
Quantitative data
use numeric (quantitative) measures
Cross-sectional data
data collected at a similar point in time
Time series data
data collected over several time periods
Panel data
combination of cross-sectional and time series data
Descriptive statistics
describe data or variables
Population
is the set of all data/variables of a statistical analysis
Sample
is a subset of the population
Census
a survey to collect data on the entire population
Sample survey
A survey to collect data on a sample
Statistical Inference
uses data from a sample to make estimates and test hypothesis about the characteristics of a population
Analytics
the scientific process of transforming data into insight for making better decisions
Mean
the average value for a variable
Excel: Mean
=average(A:A)
Median
the value in the middle, when the data are arranged in ascending order
Excel: Median
=median(A:A)
Mode
value that occurs with the greatest frequency -If there are two values that are most frequent the variable is bi-modal -if there are more then it's multi-modal
Excel: Mode
=mode.sngl(B2:B13)
Descriptive analytics
which describe what has happened in the past
Predictive analytics
uses statistical models from past data to predict the future [forecasting] or access the impact of one variable on another [inference]
Prescriptive analytics
uses models seeking to find a best (optimal) solution. Often these are some type of optimization model
frequency of the class/n
Histogram
A visual display of a frequency, relative frequency or percent frequency distribution, where the variable of interest is on the horizontal axis and the frequency, relative frequency or percent frequency is on the vertical axis. Shows the shape of the distribution of the variable of interest. A distribution is skewed if more of the data is either to the left or right of the distribution
Cumulative distribution
Presents the number of data items with values less than or equal to the upper class limit for each class.
Cumulative relative frequency distribution
shows the proportion of data items with values less than or equal to the upper limit of each class
Cumulative percent frequency distribution
shows the percentage of data items with values less than or equal to the upper limit of each class
Crosstabulation
a tabular summary of data for two variables (either categorical or quantitative)
Simpson's paradox
Conclusions drawn from two or more separate crosstabulations that can be reversed when the data are aggregated into a single crosstabulation.
Scatter diagram
graphical display of the relationship between two quantitative variables
Trendline
provides an approximation (i.e. an estimate) of the relationship; which can be positive, negative or none
Dot plot
simple graph that summarizes data by the number of dots above each data value on the horizontal axis
Stem and leaf display
a graphical display used to show simultaneously the rank order and shape of a distribution of data
Side-by-side bar graph
depicts multiple bar charts on the same display
Stacked bar charts
has one bar broken into segments of a different color showing the relative frequency of each class
Weighted mean
used when observations have different weights (relative importance)
Geometric mean
is a measure of location by finding the nth root of the product of n values
Excel: Geometric Mean
=geomean(C2:C11)
Percentile
provides information about how the data is spread over the interval from the smallest to the largest value
Quartiles
represent how the data is spread over four parts, each containing approximately 25% of the observations
Excel: Quartiles
=percentile.exc(B2:B13,D2/100) or =quartile.exc(B2:B13,D5)
Range
largest value - smallest value
Interquartile range (IQR)
Q3 - Q is the range of the middle 50% of the data.
Variance
measures variability using all the data, since it is based on the difference between the value of xi and the mean
Excel: Variance
=var.s(B2:B13)
Standard deviation
measure of variability computed by taking the positive square root of the variance
Excel: Standard deviation
=stdev.s(B2:B13)
the set of all possible experimental outcomes
Sample point
represents an experimental outcome
Multiple-step experiment
an experiment that is a sequence of steps
tree diagram
a diagram used to show the total number of possible outcomes in a probability experiment
Combination
determining the number of ways x objects may be selected from among n objects where order doesn't matter
Permulations
determining the number of ways x objects may be selected from among n objects where order is important
Probability
a numerical measure of the likelihood that an event will occur
Classical method of assigning probabilities
used when an experiment has equally likely outcomes
Relative frequency method of assigning probabilities
used when data areavailable to estimate the proportion of time theexperimental outcome will occur if the experiment isrepeated a large number of times
Subjective method of assigning probabilities
used when outcomes are not equally likely and data is unavailable
Event
a collection of sample points
Complement of event A
all outcomes in which event A does not occur
Union of 2 events
denoted by A U B, the event consisting of all outcomes in Event A, Event B, or both
Intersection of 2 events
denoted by A ∩ B, events consisting of all outcomes in both A and B
Addition law
useful when we want to know the probability that at least one of two events occurs P(A or B) = P(A) + P(B) - P(A and B)
Mutually exclusive events
occur when two events have no sample points in common. Addition Law for mutually exclusive events: 𝑃(𝐴 ∪ 𝐵) = P (A) + P(B)
Conditional probability
the probability that one event happens given that another event is already known to have happened; P(A|B)
Joint probability
the probability of two events occurring together
Independent events
2 events have no influence on each other
Multiplication law
used to compute the probability of the intersection of two events
Random variable
a numerical description of the outcome of an experiment that is either discrete or continuous
Continuous random variables
any numerical value in an interval or collection of intervals
Discrete random variable
a random variable that may assume either a finite number of values or an infinite sequence of values
Probability distribution
of a random variable gives its possible values and their probabilities
Expected value
a measure of the central location of a random variable
A continuous probability distribution that is useful in computing probabilities for the time it takes to complete a task.
Excel: Exponential distribution
=Expon.Dist(18,1/15,TRUE) =Expon.Dist(18,1/15,TRUE)-Expon.Dist(6,1/15,TRUE) =1 - Expon.Dist(8,1/15,TRUE)
Normal probability distribution
The most used probability distribution for continuous random variables. Its probability density function is bell-shaped and determined by its mean and standard deviation. Highest point is the mean, median, and mode.
Excel: Normal distribution
Lower Tail: =Norm.Dist(20000,36500,5000,TRUE) Interval: =Norm.Dist(40000,36500,5000,TRUE)-Norm.Dist(20000,36500,5000,TRUE) Upper Tail: =1-Norm.Dist(40000,36500,5000,TRUE)
Excel: Normal distribution if we know the probability
x value with 0.10 in the lower tail:=Norm.Inv(0.1,36500,5000) x value with 0.025 in the upper tail:=Norm.Inv(0.975,36500,5000)
Standard normal distribution
A normal distribution with a mean of 0 and a standard deviation of 1.
Excel: Standard normal distribution
Excel: Standard normal distribution if we know the probability
z-value with 0.025 in the lower tail:=Norm.S.Inv(0.025) z-value with 0.025 in the upper tail:=Norm.S.Inv(0.975)