






Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
A lecture note from a statistics and data analysis i course, focusing on the use of median in a production process, an overview of the normal distribution, and the concept of density curves. The lecture includes examples of two groups of workers, each with five people, trained using different methods, and monitored for five days to determine which method results in more output. The document also covers the strategy for exploring data, the importance of mathematical models, and the concept of density curves, including their facts, types, and numerical summaries.
Typology: Study notes
1 / 11
This page cannot be seen from the preview
Don't miss anything!







Group A^
Group B
Next step: Apply a Mathematical Model ^ Why would you dothis?^ ^ Most obvious – easier tosummarize theinformation this waythan reporting all thevalues^ ^ More useful – If the dataare representative of alarger group, themathematical model isuseful for describing thelarger group. Density Curves ^ Density curves are the mathematical models usedto represent the distribution of data. ^ The link between the curve and the histogram isthe^ proportion
of data that falls between two values^ Area =.^
Area =.
(^2122) (^1) ), 2 |(
μ σ πσ σμ
x e xf
),|( σ μ xf ),|( σ μ xf
68-95-99.7 rule Example: Heights of women age 18 to 24 ^ The distribution is approximately normal.Measurements are in inches. ^ How tall would a woman (18-24) need tobe to be in the top 5% of heights?
) (^25). (^6) , (^5). (^64) ( ),( (^2) ~ N NX =^ σμ
Standard Normal ^ If a variable follows a normal distribution,then z = (x – μ)/σ follows a standardnormal distribution: z ~ N(0, 1) ^ This fact is very useful for finding areasunder a normal curve other than the onesexactly at the 1, 2, and 3 SD marks. ^ When an observation is transformed bysubtracting the mean and dividing by thestandard deviation, the resulting value iscalled the z-score. Example: IQ scores ^ IQ scores are normally distributedwith a mean of 100 and a standarddeviation of 10.
Normal Quantile Plot: Examples
C Frequency
(^6420) Histogram of C7 200 150 100 50 0 -2-4-6-
99.99^ Mean^99958050 Percent^2051 0.01^ 5.02.50.0-2.5-5.0-7.5^ C
-0.03750StDev 1.339N (^1000) A D 6.605<0.005P-Value Probability Plot of C7^ Normal - 95% CI C Frequency
(^1412108) Histogram of C8 1401201008060402006420
99.99^ Mean^99958050 Percent^2051 0.01^151050 -5^ C
2.850StDev 2.311N (^1000) AD 32.046<0.005P-Value Probability Plot of C8^ Normal - 95% CI
Relationships Between Variables
Regression Density Curve Models
Correlation Center,spread NumericalSummaries
Scatterplots Histograms,dotplot, etc. GraphicalSummaries
(Chapter 2) (Chapter 1)
2 Variables 1 Variable
Exploring the Relationship ^ Generically, we call two variables Xand Y ^ Are the variables
one increases does the otherdecrease? Scatterplot ODJFS child care data:^ ^ X – Full-time weekly rate for infants^ ^ Y – Full-time weekly rate for toddlers
Infant_FTW Toddler_FTW
(^260240) (^220200) (^180160) Scatterplot of Toddler_FTW vs Infant_FTW 240220200180160140120100140120
Association or Explanation ^ In some cases, we are only interested inunderstanding whether the variables areassociated (ODJFS is a good example) ^ In some cases, one variable is thought toexplain another^ ^ Example: Pressure treatment on plastic^ ^ Response Variable (dependent variable) –Migration of chemical after 24 hours^ ^ Explanatory Variable (independent variable) –Pressure level for treament ^ Note: Do not equate explanation withcausation! Examples: ^ Time spent studying vs. grade onexam ^ Height of husband vs. height of wife ^ Percent of districts voting majorityRepublican in 2000 vs. percent ofdistricts voting majority Republicanin 2004.
What to look for in a scatterplot ^ Overall pattern – deviations from thepattern ^ Form of relationship (linear, curved, etc) ^ Direction and strength of relationship^ ^ Positively associated – increase in X is seenwith increase in Y^ ^ Negatively associated – increase in X is seenwith decrease in Y^ ^ Do the points closely follow this pattern, orloosely? ^ Outliers Pattern - Linear
Outliers Adding categorical variables ^ Use different colors or symbols to add acategorical variable to a scatterplot – don’tforget to label.
Infant_FTW Toddler_FTW
(^260240220) (^200180160) 240220200180160140120100140120
PROGRAM TYPECO MBINATIONFULL-TIME Scatterplot of Toddler_FTW vs Infant_FTW
A note of caution: lurking variables ^ Factors other than the main ones ofinterest may have an effect. Categorical Explanatory Variables ^ Side-by-side boxplots^ ^ For just a few measurements, we could plotthe actual values (previous example) ^ Back-to-back stem plots ^ For nominal variables, it makes no senseto talk about positive or negativeassociations. ^ For ordinal variables, we can make astatement about positive or negativeassociations.