Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
A review of chapters 1-7 from a statistics textbook, covering concepts such as statistics, data, populations, samples, parameters, statistics, design, descriptive statistics, probability, and statistical inference. Examples, exercises, and formulas for calculating measures of central tendency and spread, as well as understanding probability distributions and confidence intervals.
Typology: Study notes
1 / 18
Review of Chapters 1-
Chapter 1: Introduction
Concepts reviewed : statistics, data, population,
sample, parameter, statistic, design, descriptive
statistics, and inference.
Statistics is a field of study that deals with methods
for producing and analyzing data.
Data the information we gather with experiments and
surveys
Statistics involves three main aspects:
All these three aspects have been already covered by
Design refers to planning how to obtain the data (see
chapter 4).
Description refers to exploring and summarizing
patterns in the data (see chapter 2).
Inference refers to making decisions or predictions
based on the data (see chapters 7 & 8).
Populations and Samples:
The subjects are entities that we measure in a study
(people, individuals, schools, countries, days).
The population is the set of all subjects.
A sample is a part of the population.
Very important for statistical inference: random
sampling : each subject in the population has the
same chance of being included in that sample.
We distinguish between statistics and parameters.
Parameter: is a numerical summary of the
population
Statistic: is a numerical summary of a sample
taken from the population
Question/Example:
In a University of Wisconsin study about alcohol
abuse among students, 100 of the 40,858 members of
the student body in Madison were sampled and asked
to complete a questionnaire. One question asked was,
“On how many days in the past week did you
consume at least one alcoholic drink?”
Identify:
is 29- a statistic or a parameter?
Chapter 2: Summarizing Data or Descriptive
Statistics
Concepts reviewed: categorical or quantitative
variables, discrete and continuous variables; pie
chart, bar graph, dot plot, stem-and-leaf plot,
histogram.
The characteristics of interest that we measure are
called variables , because they vary from subject to
subject.
Types of variables:
categories: nominal, ordinal.
e.g. state of residence, marital status, gender,
occupation.
discrete or continuous: discrete, if the possible values
are integers; continuous, if the values form an
interval.
e.g. age, height, no. of people voting for democrats,
no. of good bolts in a lot.
Exercise:
Are the following data quantitative or categorical?
neutral, opposes)
scores)
We have 2 types of summaries of the data:
Categorical - pie charts and bar graphs
Quantitative - histograms, stem and leaf plot, dot
plot, and box plot.
Common shapes (for categorical or for quantitative?):
unimodal
distribution (make sure you know the definitions
of these terms; if not use the office hours)
Important!!! Before starting any analysis always plot
your data to identify outliers (extreme values)!
Center:
n
i
i 1
n
are ordered
distribution
Examples: Normal distribution, skewed right and left
distributions, identify the mean, median and mode.
Notice the importance of outliers.
Spread:
o Sample Variance
n
2
i
2
2 i 1
n 1
o Sample Standard Deviation
o
n
2
i
2 i 1
n 1
o Range: max-min
Learn to find the sample mean and sample standard
deviation using your calculator.
Chapter 3: Relation between Two Variables
This is the topic of this entire course (STA 3024).
During the next 6 weeks we will focus on these
concepts: explanatory and response variables,
association, contingency table, correlation,
regression.
Chapter 4: Gathering Data
Through randomization (ALWAYS): if we do
not use randomization, we cannot talk
about inference.
We can distinguish 2 types of studies:
Experimental (assign treatments on subjects)
Observational (no treatment is applied to the
subjects, we just observe the data).
Types of errors:
Sampling error (ME- margin of error: depends
on the size of SRS)
Non-sampling errors, due to different sources of
bias sampling, e.g Undercoverage, nonresponse,
missing data
Chapter 5: Probability
With a randomized experiment or a random sample,
the probability of a particular outcome is the
proportion of times that the outcome would occur in
a long run of observations.
Basic Rules of Probability (long-run proportion)
General Rule: for any event A, 0 ≤ P (A) ≤ 1
Complementary rule: P(A
c
Addition Rule of Probability
P (A or B) = P (A) +P (B) – P (A and B)
The sum of the probabilities for all the possible
values of a random variable equals 1.
Multiplication Rule of Probability (ONLY
P (A and B) = P (A) × P (B)
Conditional Probability (probability of A
given B)
P (A | B) = P (A and B)/ P (B)
Chapter 6: Probability Distributions
Probability distributions of random variables :
Binomial Distribution
distribution of discrete rv
Each of n independent trials has two possible
outcomes : “success” or “failure”.
Each trial has the same probability of success p.
The probability of failure is denoted by 1-p.
The total number of successes has a binomial
distribution, with parameter p.
Example: flipping a coin: take head=success and
tail=failure. The total number of heads when we toss
a coin for n times has a binomial distribution with
parameter .5.
Normal Distribution
It is a distribution of a continuous random
variable
N (,) has 2 parameters: (population mean)
and (population standard deviation)
z-score for a single observation x
from the mean
YOU NEED TO KNOW HOW TO FIND z-score
and t-score, using tables A and B from the
Appendix A of the book.
Example:
Scores on the verbal or math SAT are approximately
normally distributed with mean 500 and standard
deviation 100. The scores range from 200 to 800.
a. If one of your SAT scores was x=650, how many
standard deviations from the mean was it?
b. What percentage of SAT scores was higher than
yours? (From table A, the z-score of 1.5 has
cumulative probability .9332).
Chapter 7 and 8 : Statistical Inference
The following are some important concepts you
should be very familiar with:
A Parameter
It is a numerical summary of the population.
We calculate parameters using population data.
However, we usually (almost always) do not
have population data. Thus, the values of
population parameters are almost always
unknown.
So we estimate the population parameters using
data from a random sample.
A Statistic
It is a numerical summary of a sample.
We calculate the values of sample statistics using
data from random samples.
We use sample statistics to make statistical
inferences about the unknown population
parameters.
Statistical Inference is the process of making a
statement about one or more population parameter
using one or more sample statistic, obtained from a
random and representative sample.
Two major methods for making statistical inference:
Confidence intervals (ch.7)
Significance test (ch. 8)
Question for Friday’s quiz: Which one of the two
methods is more informative and why?
Confidence Interval
Def .: an interval of numbers within which the
parameter value is believed to fall. A confidence
interval is constructed by adding and subtracting a
margin of error to a given point estimate.
Point estimate : is a single number that is our
“best guess” for the parameter.
Margin of Error: Measures how accurate the
estimate is likely to be in estimating the
parameter. It is based on the standard error of
the sampling distribution of that point estimate.
Some Point Estimates:
Population
Parameters
Sample
Statistics
(Unknown) (Estimators)
Mean
N
i
i 1
X
n
i
i 1
n
Standard
Deviation
N
2
i X
i 1
X
n
2
i
i 1
X
n 1
Proportion
N
i
i 1
p
Where X
i
= 0 or 1
p
n
Where X = # of
“Successes” in the
sample.
The Sample Mean is a point estimate of the
population mean.
The Sample Standard Deviation is a point
estimate of the population standard deviation.
The Sample Proportion is a point estimate of
the population proportion.
Properties of Estimators:
Unbiasedness: An estimator is said to be unbiased if
its sampling distribution is centered at the parameter.
The sample mean is an unbiased estimator
of the population mean, , since
X X
The sample proportion is an unbiased
estimator of the population proportion, p
because
ˆp
The sample standard deviation, S, is not
unbiased; it has a small bias that decreases
as n (sample size) increases.
Small Standard Error: The standard error of a
statistic (an estimator) is the standard deviation of the
sampling distribution of the statistic. It describes the
variability in the possible values of the statistic. A
good estimator has a small standard error. The
estimators mentioned above all have small standard
errors.
Standard Errors of Estimators:
X
SE( X ) / n
Estimated by : Est. SE( X ) S / n
p( 1 p )
SE( p )
n
p( 1 p )
Estimated by : Est. SE( p )
n
Confidence interval:
o General Form:
Estimator ME
o ME =
( Table value ) SE( Estimator )
o CI for =
*
X t
n
o CI for p =
*
p( 1 p )
p z
n
o Make sure you remember how to use tables of
standard normal distribution and t-distribution.
Example:
When the 1998 GSS asked 1158 subjects, “Do you
believe in Heaven?”, the proportion who answered
yes was. 86. The standard error of this estimate
was .01.
a. Find and interpret the margin of error for a 95%
confidence interval for the population proportion of
Americans who believe in heaven.
This means that with probability .95 the population
percentage of Americans who believe in Heaven is
no more 2% lower or 2 % higher than the reported
sample percentage 86%
b. Construct the 95% confidence interval. Interpret it!
At a 95% confidence interval, we estimate that the
population proportion of Americans who believe in
Heaven is at least .84 but no more than .88, that is,
between 84% and 88%.
c. Construct a 99% confidence interval. Discuss it!