Statistics Lecture #2: Normal Distribution & Median in Production Process Examples, Study notes of Statistics

A lecture note from a statistics and data analysis i course, focusing on the use of median in a production process, an overview of the normal distribution, and the concept of density curves. The lecture includes examples of two groups of workers, each with five people, trained using different methods, and monitored for five days to determine which method results in more output. The document also covers the strategy for exploring data, the importance of mathematical models, and the concept of density curves, including their facts, types, and numerical summaries.

Typology: Study notes

Pre 2010

Uploaded on 07/23/2009

koofers-user-1jo
koofers-user-1jo 🇺🇸

10 documents

1 / 11

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1
Statistics 528
Data Analysis I
Lecture #2
June 22, 2006
Overview of Today’s Lecture
Example of use of median in a
production process
IPS Sections 1.3 and 2.1
The Normal Distribution
Scatterplots
Production Process – Median Example
Two groups of workers, each with five
people.
Group A is trained using one method, and
Group B is trained using another method.
Over the next 5 days, we monitor how
many completed products each worker
makes on each day.
We want to know if the method used to
train Group A results in more output than
the method used to train Group B.
Production Process – Median Example
pf3
pf4
pf5
pf8
pf9
pfa

Partial preview of the text

Download Statistics Lecture #2: Normal Distribution & Median in Production Process Examples and more Study notes Statistics in PDF only on Docsity!

Statistics 528Data Analysis I Lecture #2June 22, 2006 Overview of Today’s Lecture ^ Example of use of median in aproduction process ^ IPS Sections 1.3 and 2.1^ ^ The Normal Distribution^ ^ Scatterplots

Production Process – Median Example ^ Two groups of workers, each with fivepeople. ^ Group A is trained using one method, andGroup B is trained using another method. ^ Over the next 5 days, we monitor howmany completed products each workermakes on each day. ^ We want to know if the method used totrain Group A results in more output thanthe method used to train Group B. Production Process – Median Example

Production Process – Median Example

Group A^

Group B

Review: Strategy for Exploring Data ^ Plot the data^ ^ Categorical data – bar chart or piechart^ ^ Quantitative data – stem-and-leaf,histogram, time plot, etc. ^ Look for the overall pattern anddeviations from that pattern ^ Calculate numerical summaries todescribe the center and spread

Next step: Apply a Mathematical Model ^ Why would you dothis?^ ^ Most obvious – easier tosummarize theinformation this waythan reporting all thevalues^ ^ More useful – If the dataare representative of alarger group, themathematical model isuseful for describing thelarger group. Density Curves ^ Density curves are the mathematical models usedto represent the distribution of data. ^ The link between the curve and the histogram isthe^ proportion

of data that falls between two values^ Area =.^

Area =.

Standard Deviation of a Density Curve ^ The concept – approximately theaverage distance from the mean. ^ Difficult to approximate by eye, butcan be calculated mathematically. Notation: Observation summaries vs.Density properties ^ For observations of a variable:^ ^ Mean =^ ^ Standard Deviation = ^ For a density curve:^ ^ Mean =^ ^ Standard Deviation =

x μ^

s σ

Normal Distribution (Density) ^ The normaldensity is asymmetric,bell-shapedcurve that isuseful fordescribingmany types ofdata

(^2122) (^1) ), 2 |(

− −

μ σ πσ σμ

x e xf

),|( σ μ xf ),|( σ μ xf

Why is it important? 1.^ Good description of real data 2.^ Good approximation to the resultsof chance outcomes 3.^ Statistical inference proceduresrely heavily on the normaldistribution.

68-95-99.7 rule Example: Heights of women age 18 to 24 ^ The distribution is approximately normal.Measurements are in inches. ^ How tall would a woman (18-24) need tobe to be in the top 5% of heights?

) (^25). (^6) , (^5). (^64) ( ),( (^2) ~ N NX =^ σμ

Standard Normal ^ If a variable follows a normal distribution,then z = (x – μ)/σ follows a standardnormal distribution: z ~ N(0, 1) ^ This fact is very useful for finding areasunder a normal curve other than the onesexactly at the 1, 2, and 3 SD marks. ^ When an observation is transformed bysubtracting the mean and dividing by thestandard deviation, the resulting value iscalled the z-score. Example: IQ scores ^ IQ scores are normally distributedwith a mean of 100 and a standarddeviation of 10.

X~N(100,100)

^ What fraction of people have an IQscore under 85?^ 1.^ Draw a picture.^ 2.^ Shade the region of interest.^ 3.^ Look up the areas you need in TableA.

Normal Quantile Plot: Examples

C Frequency

(^6420) Histogram of C7 200 150 100 50 0 -2-4-6-

99.99^ Mean^99958050 Percent^2051 0.01^ 5.02.50.0-2.5-5.0-7.5^ C

-0.03750StDev 1.339N (^1000) A D 6.605<0.005P-Value Probability Plot of C7^ Normal - 95% CI C Frequency

(^1412108) Histogram of C8 1401201008060402006420

99.99^ Mean^99958050 Percent^2051 0.01^151050 -5^ C

2.850StDev 2.311N (^1000) AD 32.046<0.005P-Value Probability Plot of C8^ Normal - 95% CI

Relationships Between Variables

Regression Density Curve Models

Correlation Center,spread NumericalSummaries

Scatterplots Histograms,dotplot, etc. GraphicalSummaries

(Chapter 2) (Chapter 1)

2 Variables 1 Variable

Exploring the Relationship ^ Generically, we call two variables Xand Y ^ Are the variables

associated

?^ When

the value of one increases does theother increase?

When the value of

one increases does the otherdecrease? Scatterplot  ODJFS child care data:^ ^ X – Full-time weekly rate for infants^ ^ Y – Full-time weekly rate for toddlers

Infant_FTW Toddler_FTW

(^260240) (^220200) (^180160) Scatterplot of Toddler_FTW vs Infant_FTW 240220200180160140120100140120

Association or Explanation ^ In some cases, we are only interested inunderstanding whether the variables areassociated (ODJFS is a good example) ^ In some cases, one variable is thought toexplain another^ ^ Example: Pressure treatment on plastic^ ^ Response Variable (dependent variable) –Migration of chemical after 24 hours^ ^ Explanatory Variable (independent variable) –Pressure level for treament ^ Note: Do not equate explanation withcausation! Examples: ^ Time spent studying vs. grade onexam ^ Height of husband vs. height of wife ^ Percent of districts voting majorityRepublican in 2000 vs. percent ofdistricts voting majority Republicanin 2004.

What to look for in a scatterplot ^ Overall pattern – deviations from thepattern ^ Form of relationship (linear, curved, etc) ^ Direction and strength of relationship^ ^ Positively associated – increase in X is seenwith increase in Y^ ^ Negatively associated – increase in X is seenwith decrease in Y^ ^ Do the points closely follow this pattern, orloosely? ^ Outliers Pattern - Linear

Outliers Adding categorical variables ^ Use different colors or symbols to add acategorical variable to a scatterplot – don’tforget to label.

Infant_FTW Toddler_FTW

(^260240220) (^200180160) 240220200180160140120100140120

PROGRAM TYPECO MBINATIONFULL-TIME Scatterplot of Toddler_FTW vs Infant_FTW

A note of caution: lurking variables ^ Factors other than the main ones ofinterest may have an effect. Categorical Explanatory Variables ^ Side-by-side boxplots^ ^ For just a few measurements, we could plotthe actual values (previous example) ^ Back-to-back stem plots ^ For nominal variables, it makes no senseto talk about positive or negativeassociations. ^ For ordinal variables, we can make astatement about positive or negativeassociations.

Example: Boxplots for an ordinalvariable vs. a continuous variable.