Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Stats Review: Ch. 1-7 - Populations, Samples, Descriptive Stats, Probability, Inference, Study notes of Statistics

A review of chapters 1-7 from a statistics textbook, covering concepts such as statistics, data, populations, samples, parameters, statistics, design, descriptive statistics, probability, and statistical inference. Examples, exercises, and formulas for calculating measures of central tendency and spread, as well as understanding probability distributions and confidence intervals.

Typology: Study notes

Pre 2010

Uploaded on 09/17/2009

koofers-user-a78
koofers-user-a78 🇺🇸

10 documents

1 / 18

Toggle sidebar

Related documents


Partial preview of the text

Download Stats Review: Ch. 1-7 - Populations, Samples, Descriptive Stats, Probability, Inference and more Study notes Statistics in PDF only on Docsity!

Review of Chapters 1-

Chapter 1: Introduction

Concepts reviewed : statistics, data, population,

sample, parameter, statistic, design, descriptive

statistics, and inference.

Statistics is a field of study that deals with methods

for producing and analyzing data.

Data the information we gather with experiments and

surveys

Statistics involves three main aspects:

  1. Design
  2. Description
  3. Inference

All these three aspects have been already covered by

STA 2023.

Design refers to planning how to obtain the data (see

chapter 4).

Description refers to exploring and summarizing

patterns in the data (see chapter 2).

Inference refers to making decisions or predictions

based on the data (see chapters 7 & 8).

Populations and Samples:

The subjects are entities that we measure in a study

(people, individuals, schools, countries, days).

The population is the set of all subjects.

A sample is a part of the population.

Very important for statistical inference: random

sampling : each subject in the population has the

same chance of being included in that sample.

We distinguish between statistics and parameters.

 Parameter: is a numerical summary of the

population

 Statistic: is a numerical summary of a sample

taken from the population

Question/Example:

In a University of Wisconsin study about alcohol

abuse among students, 100 of the 40,858 members of

the student body in Madison were sampled and asked

to complete a questionnaire. One question asked was,

“On how many days in the past week did you

consume at least one alcoholic drink?”

Identify:

  • the population
  • sample
  • For the 100 students sampled, 29 said “0”. What

is 29- a statistic or a parameter?

  • Is this study an example of random sampling?

Chapter 2: Summarizing Data or Descriptive

Statistics

Concepts reviewed: categorical or quantitative

variables, discrete and continuous variables; pie

chart, bar graph, dot plot, stem-and-leaf plot,

histogram.

The characteristics of interest that we measure are

called variables , because they vary from subject to

subject.

Types of variables:

  • Categorical : observations fall into one of a set of

categories: nominal, ordinal.

e.g. state of residence, marital status, gender,

occupation.

  • Quantitative : take numerical values. It can be

discrete or continuous: discrete, if the possible values

are integers; continuous, if the values form an

interval.

e.g. age, height, no. of people voting for democrats,

no. of good bolts in a lot.

Exercise:

Are the following data quantitative or categorical?

  1. Attitude towards legalization of marijuana (favor,

neutral, opposes)

  1. Scholastic Aptitude Test Score (200-800 range for

scores)

  1. Years of school completed (0, 1, 2, 3 ...)
  2. Employment Status (employed, unemployed)
  3. Crime rate (50 per 1000 etc.)
  4. Population growth rate (in percentages)

We have 2 types of summaries of the data:

  1. Graphical
  2. Numerical
  3. Graphical methods:

Categorical - pie charts and bar graphs

Quantitative - histograms, stem and leaf plot, dot

plot, and box plot.

Common shapes (for categorical or for quantitative?):

  • mound shape (bell shaped), symmetric or

unimodal

  • bimodal
  • skewed to the left, or to the right 2. Numerical summaries:
  • refer to the center and the spread of the

distribution (make sure you know the definitions

of these terms; if not use the office hours)

Important!!! Before starting any analysis always plot

your data to identify outliers (extreme values)!

Center:

  • Sample Mean:

n

i

i 1

X

X

n

  • Median: midpoint of the observations when they

are ordered

  • Mode: highest frequency or highest point in a

distribution

Examples: Normal distribution, skewed right and left

distributions, identify the mean, median and mode.

Notice the importance of outliers.

Spread:

o Sample Variance

 

n

2

i

2

2 i 1

( X X )

S S

n 1

 

o Sample Standard Deviation

o

n

2

i

2 i 1

( X X )

S S

n 1

 

o Range: max-min

Learn to find the sample mean and sample standard

deviation using your calculator.

Chapter 3: Relation between Two Variables

This is the topic of this entire course (STA 3024).

During the next 6 weeks we will focus on these

concepts: explanatory and response variables,

association, contingency table, correlation,

regression.

Chapter 4: Gathering Data

Through randomization (ALWAYS): if we do

not use randomization, we cannot talk

about inference.

We can distinguish 2 types of studies:

 Experimental (assign treatments on subjects)

 Observational (no treatment is applied to the

subjects, we just observe the data).

Types of errors:

 Sampling error (ME- margin of error: depends

on the size of SRS)

 Non-sampling errors, due to different sources of

bias sampling, e.g Undercoverage, nonresponse,

missing data

Chapter 5: Probability

With a randomized experiment or a random sample,

the probability of a particular outcome is the

proportion of times that the outcome would occur in

a long run of observations.

Basic Rules of Probability (long-run proportion)

General Rule: for any event A, 0 ≤ P (A) ≤ 1

Complementary rule: P(A

c

) = 1 – P(A)

Addition Rule of Probability

P (A or B) = P (A) +P (B) – P (A and B)

 The sum of the probabilities for all the possible

values of a random variable equals 1.

Multiplication Rule of Probability (ONLY

FOR INDEPENDENT EVENTS)

P (A and B) = P (A) × P (B)

Conditional Probability (probability of A

given B)

P (A | B) = P (A and B)/ P (B)

Chapter 6: Probability Distributions

Probability distributions of random variables :

Binomial Distribution

 distribution of discrete rv

 Each of n independent trials has two possible

outcomes : “success” or “failure”.

 Each trial has the same probability of success p.

The probability of failure is denoted by 1-p.

 The total number of successes has a binomial

distribution, with parameter p.

Example: flipping a coin: take head=success and

tail=failure. The total number of heads when we toss

a coin for n times has a binomial distribution with

parameter .5.

Normal Distribution

 It is a distribution of a continuous random

variable

 N (,) has 2 parameters:  (population mean)

and  (population standard deviation)

 z-score for a single observation x

  • is the number of standard deviations that x falls

from the mean

  • z= (x-)/

YOU NEED TO KNOW HOW TO FIND z-score

and t-score, using tables A and B from the

Appendix A of the book.

Example:

Scores on the verbal or math SAT are approximately

normally distributed with mean 500 and standard

deviation 100. The scores range from 200 to 800.

a. If one of your SAT scores was x=650, how many

standard deviations from the mean was it?

R: 1.

b. What percentage of SAT scores was higher than

yours? (From table A, the z-score of 1.5 has

cumulative probability .9332).

R: 1-.

Chapter 7 and 8 : Statistical Inference

The following are some important concepts you

should be very familiar with:

A Parameter

 It is a numerical summary of the population.

 We calculate parameters using population data.

However, we usually (almost always) do not

have population data. Thus, the values of

population parameters are almost always

unknown.

 So we estimate the population parameters using

data from a random sample.

A Statistic

 It is a numerical summary of a sample.

 We calculate the values of sample statistics using

data from random samples.

 We use sample statistics to make statistical

inferences about the unknown population

parameters.

Statistical Inference is the process of making a

statement about one or more population parameter

using one or more sample statistic, obtained from a

random and representative sample.

Two major methods for making statistical inference:

 Confidence intervals (ch.7)

 Significance test (ch. 8)

Question for Friday’s quiz: Which one of the two

methods is more informative and why?

Confidence Interval

Def .: an interval of numbers within which the

parameter value is believed to fall. A confidence

interval is constructed by adding and subtracting a

margin of error to a given point estimate.

Point estimate : is a single number that is our

“best guess” for the parameter.

Margin of Error: Measures how accurate the

estimate is likely to be in estimating the

parameter. It is based on the standard error of

the sampling distribution of that point estimate.

Some Point Estimates:

Population

Parameters

Sample

Statistics

(Unknown) (Estimators)

Mean

N

i

i 1

X

X

N

n

i

i 1

X

X

n

Standard

Deviation

N

2

i X

i 1

X

( X )

N

n

2

i

i 1

X

( X X )

S

n 1

Proportion

N

i

i 1

X

p

N

,

Where X

i

= 0 or 1

X

ˆ

p

n

,

Where X = # of

“Successes” in the

sample.

 The Sample Mean is a point estimate of the

population mean.

 The Sample Standard Deviation is a point

estimate of the population standard deviation.

 The Sample Proportion is a point estimate of

the population proportion.

Properties of Estimators:

Unbiasedness: An estimator is said to be unbiased if

its sampling distribution is centered at the parameter.

 The sample mean is an unbiased estimator

of the population mean, , since

X X

  E( X )   Population Mean

.

 The sample proportion is an unbiased

estimator of the population proportion, p

because

ˆp

ˆ

  E( p )  p  Population Proportion

 The sample standard deviation, S, is not

unbiased; it has a small bias that decreases

as n (sample size) increases.

Small Standard Error: The standard error of a

statistic (an estimator) is the standard deviation of the

sampling distribution of the statistic. It describes the

variability in the possible values of the statistic. A

good estimator has a small standard error. The

estimators mentioned above all have small standard

errors.

Standard Errors of Estimators:

X

SE( X ) / n

Estimated by : Est. SE( X ) S / n

p( 1 p )

ˆ

SE( p )

n

ˆ ˆ

p( 1 p )

ˆ

Estimated by : Est. SE( p )

n

 

Confidence interval:

o General Form:  

EstimatorME

o ME =

( Table value )SE( Estimator )

o CI for=

*

S

X t

n

 

 

 

 

o CI for p =

*

ˆ ˆ

p( 1 p )

ˆ

p z

n

 

 

 

 

o Make sure you remember how to use tables of

standard normal distribution and t-distribution.

Example:

When the 1998 GSS asked 1158 subjects, “Do you

believe in Heaven?”, the proportion who answered

yes was. 86. The standard error of this estimate

was .01.

a. Find and interpret the margin of error for a 95%

confidence interval for the population proportion of

Americans who believe in heaven.

ME = 1.96* .01=.

This means that with probability .95 the population

percentage of Americans who believe in Heaven is

no more 2% lower or 2 % higher than the reported

sample percentage 86%

b. Construct the 95% confidence interval. Interpret it!

[.86 - (1.96 .01), .86 + (1.96.01)]= [.8404, .8796]

At a 95% confidence interval, we estimate that the

population proportion of Americans who believe in

Heaven is at least .84 but no more than .88, that is,

between 84% and 88%.

c. Construct a 99% confidence interval. Discuss it!