Statistics: Descriptive and Inferential Methods and Distributions - Prof. Chung-Ching Wang, Study notes of Data Analysis & Statistical Methods

An introduction to descriptive and inferential statistics, including definitions of key terms such as population, sample, census, quantitative data, qualitative data, and measures of reliability. It also covers various statistical distributions and tools, including the box-plot, quantile-quantile plot, stem-and-leaf display, mean, median, mode, range, standard deviation, variance, iqr, chebyshev's rule, and the empirical rule. Additionally, it discusses the concepts of symmetric, skewed, and mound-shaped distributions.

Typology: Study notes

Pre 2010

Uploaded on 11/08/2009

nmpatel
nmpatel 🇺🇸

2 documents

1 / 2

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Chapter 1
Descriptive statistics: utilizes numerical and graphical methods to look for patterns in a data set, to summarize information revealed in a
data set, and to present that information in a convenient form
Inferential statistics: utilizes sample data to make estimates, decisions, predictions, and other generalizations about a large set of data
Population: a set of units that we are interested in studying
Sample: a subset of the units of a population
Census and Sample: Measurement is a process we use to assign numerical values to variables of individual population units. When we
measure a variable for every unit of a population, the result is a census of the population. If we only measure part of the units in a
population, the result is a sample of the population.
Elements of inferential statistics:
The population or sample of interest
One or more variables that are to be investigated
Tables, graphs, or numerical summary tools
Identifications of patterns in the data
Elements of interferential statistical problems:
The population of interest
One or more variables that are to be investigated
The sample of population units
The inference about the population based on information contained in the sample
A measure of the reliability of the inference
Quantitative data: measurements that are recorded on a naturally occurring numerical scale
Qualitative data: measurements that cannot be measured on a natural numerical scale; can only be classified into one group
Representative sample: exhibits characteristics typical of those possessed by the target population
Measure of reliability: a statement (usually quantitative) about the degree of uncertainty associated with a statistical inference
Chapter 2
Box-Plot: distance between points is figured out by 1.5*IQR
Quantile-Quantile Plot: the dot chart
Stem-and Leaf Display: all the numbers lying out
Mean: average Median: the middle number when the numbers are in order Mode: number that occurs the most
Range: largest number minus smallest number
Standard Deviation (s): defined as the positive square root of the sample variance (s2) or
s=
s2
Variance (s2): equal to the sum of the squared distances from the mean, divided by (n-1) or
s
2
=
i1
n
x
i
2
(
i=1
n
x
i
)
2
n
n1
Upper Quartile:
the 75th percentile Lower Quartile: the 25th percentile
IQR: distance between the lower and upper quartile
Chebyshev's Rule: Generally, at least I 1/k2 of the measurements will fall within k standard deviations of the mean for any number of k
greater than 1,regardless of the sharp of the frequency distribution.(a) At least 3/4 of the measurements will fall within the interval (x-2s,
x+2s) for samples and (μ-2σ, μ+2σ) for populations. (b) At least 8/9 of the measurements will fall within the interval (x-3s, x+3s) for
samples and (μ-3σ, μ+3σ) for populations.
Empirical Rule: the empirical rule is a rule of thumb that applies to samples or populations with frequency distributions that are mound-
shaped. (a) Approximately 68% of the measurements will fall within the interval (x-s, x+s) for samples and (μ-σ, μ+σ) for populations.
(b) Approximately 95% of the measurements will fall within the interval (x-2s, x+2s) for samples and (μ-2σ, μ+2σ) for populations. (c)
Approximately 99.7% of the measurements will fall within the interval (x-3s, x+3s) for samples and (μ-3σ, μ+3σ) for populations
Z-Score: suppose x is a measurement from a sample with mean x and standard deviation s. The sample Z score of x is
¿
(
xx
)
s
.
Symmetric:
Skewed: one tail of the distribution has more extreme observations than the other tail; if the lower part is on left it’s a right skew and
vice versa
Mound-Shaped distribution: the mean, median, and mode are all about the same
Chapter 3
Definitions:
Experiment: An experiment is an act or process of observation that leads to a single outcome that cannot be predicted with certainty.
pf2

Partial preview of the text

Download Statistics: Descriptive and Inferential Methods and Distributions - Prof. Chung-Ching Wang and more Study notes Data Analysis & Statistical Methods in PDF only on Docsity!

Chapter 1 Descriptive statistics: utilizes numerical and graphical methods to look for patterns in a data set, to summarize information revealed in a data set, and to present that information in a convenient form Inferential statistics: utilizes sample data to make estimates, decisions, predictions, and other generalizations about a large set of data Population: a set of units that we are interested in studying Sample: a subset of the units of a population Census and Sample: Measurement is a process we use to assign numerical values to variables of individual population units. When we measure a variable for every unit of a population, the result is a census of the population. If we only measure part of the units in a population, the result is a sample of the population. Elements of inferential statistics:  The population or sample of interest  One or more variables that are to be investigated  Tables, graphs, or numerical summary tools  Identifications of patterns in the data Elements of interferential statistical problems:  The population of interest  One or more variables that are to be investigated  The sample of population units  The inference about the population based on information contained in the sample  A measure of the reliability of the inference Quantitative data: measurements that are recorded on a naturally occurring numerical scale Qualitative data: measurements that cannot be measured on a natural numerical scale; can only be classified into one group Representative sample: exhibits characteristics typical of those possessed by the target population Measure of reliability: a statement (usually quantitative) about the degree of uncertainty associated with a statistical inference Chapter 2 Box-Plot: distance between points is figured out by 1.5*IQR Quantile-Quantile Plot: the dot chart Stem-and Leaf Display: all the numbers lying out Mean: average Median: the middle number when the numbers are in order Mode: number that occurs the most Range: largest number minus smallest number

Standard Deviation (s): defined as the positive square root of the sample variance (s^2 ) or s =√ s^2

Variance (s^2 ): equal to the sum of the squared distances from the mean, divided by (n-1) or

s

2

i − 1 n

xi

2

i = 1 n

xi )

2

n

n − 1

Upper Quartile: the 75th^ percentile Lower Quartile: the 25th^ percentile IQR: distance between the lower and upper quartile Chebyshev's Rule: Generally, at least I 1/k2 of the measurements will fall within k standard deviations of the mean for any number of k greater than 1,regardless of the sharp of the frequency distribution. (a) At least 3/4 of the measurements will fall within the interval (x-2s, x+2s) for samples and (μ-2σ, μ+2σ) for populations. (b) At least 8/9 of the measurements will fall within the interval (x-3s, x+3s) for samples and (μ-3σ, μ+3σ) for populations. Empirical Rule: the empirical rule is a rule of thumb that applies to samples or populations with frequency distributions that are mound- shaped. (a) Approximately 68% of the measurements will fall within the interval (x-s, x+s) for samples and (μ-σ, μ+σ) for populations. (b) Approximately 95% of the measurements will fall within the interval (x-2s, x+2s) for samples and (μ-2σ, μ+2σ) for populations. (c) Approximately 99.7% of the measurements will fall within the interval (x-3s, x+3s) for samples and (μ-3σ, μ+3σ) for populations

Z-Score: suppose x is a measurement from a sample with mean x and standard deviation s. The sample Z score of x is ¿

( x − x )

s

Symmetric: Skewed: one tail of the distribution has more extreme observations than the other tail; if the lower part is on left it’s a right skew and vice versa Mound-Shaped distribution: the mean, median, and mode are all about the same Chapter 3 Definitions: Experiment: An experiment is an act or process of observation that leads to a single outcome that cannot be predicted with certainty.

Sample Point: A sample point is the most basic outcomes from an experiment. Sample Space: Sample space is the collection of all possible sample points from an experiment. Event: An event is a specific collection of sample points. Union: The union of two events A and B is the event that either A or B or both occur in a single trail of the experiment. We denote the

union of A and B by the symbol. A ∪ B

Intersection: The intersection of A and B is the event that both A and B occur n a single trail of the experiment. We denote the intersection of A and B by the symbol A ∩B Complementary Events: The complement of an event A is the event that A does not occur. This means the complement event of A consists all sample points that are not in event A. We denote the complement of event by the symbol AC Mutually Exclusive Events: Events A and B are mutually exclusive events if event A and event B have no sample points in common. Independent and Dependent: Events A and B are independent events if the occurrence of B does not alter the probability that A has occurred. That is P(A|B) = P(A) and P(B|A) = P(B). Events A and B are dependent if they are not independent Additive Rule of Probability: The probability of the union of events A and B is the sum of probability of event A and the probability of

event B, minus the probability of the intersection of events A and B, i.e., P ( A ∪B ) = P ( A )+ P ( B )− P ( A ∩B )

Event A and Event B are mutually exclusive if contains no sample points, that is, A and B have no sample points in common. A and Ac has no sample points in common, i.e., A and Ac^ are mutually exclusive. Note: (^) P ( A ∪ Ac^ )= P ( A ) + P ( Ac ) P ( A ∩ Ac^ )= 0  The sum of the probabilities of complementary events equals one; that is, P(A) + P(Ac) = 1.  Event A and Event Ac^ have no sample points in common.  Event A and Event Ac^ are mutually exclusive.  Venn diagram is a good graphical tool for understanding the concept of compound events, complementary events, and mutually exclusive.  P(A) + P(Ac) = 1. This means the Event A and the Event Ac^ cover the entire sample space. The multiplicative rule of probability is used to find the probability of intersection of two events. Let A and B be two events, the

multiplicative rule of probability is P(A ∩ B) = P(A|B) * P(B) = P(B|A) * P(A)

We say two events A and B are independent events if the probability of the occurrence of one event (say A) does not alter the probability that another event (say B) has occurred. P(A|B) = P(A) and P(B|A) = P(B) when A and B are independent events.

If events A and B are independent, the multiplicative rule of probability becomes to P(A ∩ B) = P(A) * P(B) = P(B) * P(A). We say

events A and B dependent events (def 3.9) if A and B are not independent.

 Note: (1) P(A ∪ B) = P(A) + P(B) if events A and B are mutually exclusive because P(A ∩ B) = 0. (2) P(A ∩ B) = P(A) * P(B)

if events A and B are independent events. The Multiplicative Rule: you are drawing one element from each of k sets of elements with size n 1 , n2, …nk, respectively. The number of different sample points of this experiment is equal to the product n 1 xn 2 x…nk Permutations Rule: You are drawing n elements from a set of N elements and arranging these n elements in a distinct order. The number of different outcomes of this experiment is equal to = N! / (N-n)!.  Note: (1) N! = N * (N-1) * (N-2) * … * 2 * 1. (2) 5! = 5 * 4 * 3 * 2 * 1 = 120. Combinations Rule: you are drawing n elements from a set of N elements. The number of different outcomes of this experiment is equal to N! / [(N-n)! *n!]. Since the order is of important. Partitions Rule (pg.160): You are partitioning a set on N elements into K groups consisting of elements (N=n 1 +n 2 +...+nk).The number of different outcomes of this experiment is

N!

n 1! × n 2! ×… nk!