














Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
1 / 22
This page cannot be seen from the preview
Don't miss anything!















Chapter 1: Stats Starts Here
What is statistics? A way of reasoning, along with a collection of tools and methods designed to help us understand the world. P
Statistics is about variation p
Things vary
− people are different − can't see everything or measure it all − what we can measure might be inaccurate.
How do we make sense of an imperfect picture of an imperfect world?
When you plan a statistical study or explore data from someone else’s work, ask your self Why, Who, What and How (and if possible When and Where)
Who
Individual Cases (Individuals) are the objects described by a set of data. Cases can be people, animals, things.
What
–Why? What purpose do the data have? Do we hope to answer some specific questions? Do we want to draw conclusions about individuals other than the ones actually have data for?
•Categorical and Quantitative variables
–A categorical variable (or qualitative variables) places an individual into one of several groups or categories. These categories are sometimes called the levels.
The set of categories for a categorical variable is called a nominal scale.
Example: For the categorical variable, the mode of transportation to work we might use the nominal scale {Bus, subway, car, bicycle, walk}
Because their scale consists of a set of categories, the variables with an ordinal scale are often treated as qualitative and analyzed using methods for qualitative variables
However, in some respects, ordinal scale closely resembles interval scales.
Thus in some situations, quantitative treatment of ordinal data has benefits in some statistical methods for data analysis.
Example
Airlines monitored for safety and customer service. For each flight, carriers must report:
Chapter 2: Displaying and describing categorical data
Distribution of STAB22 grades
Note: These are hypothetical values.
Grade Summer 2010 Summer 2011
A 40 45
B 75 70
C 50 58
D 20 19
F 15 8
Total 200 200
− StatCrunch: graphics, bar plot, with summary
Most students end up with a B
Bar charts
Hard to see much difference.
Pie charts
Again, not much difference apparent. Better: make bar chart for each year, but put bars side by side:
Contingency tables: two (or more) categorical variables p
Accepted Rejected Total Males 490 210 700 Females 280 220 500 Total 770 430 1200
− 280 of the applicants were females who were accepted. − How many of the applicants were males who were rejected? − 210 − How many females applied altogether? − 500
Joint distribution (percentage of total) p
Accepted Rejected Total Males 490 210 700 Females 280 220 500 Total 770 430 1200
− More males than females applied (and more people accepted than not), so difficult to compare numbers. − Compute percentages. (divide everything by 1200). − joint distribution. − − StatCrunch: Stat, Tables, Contingency, With Summary.
Conditional distribution p
− Joint distribution is “out of everything”. − Doesn't answer question “are more males than females accepted”? − For that: out of males, what % accepted – row percents. − See males and females both add up to 100%. − 70% of male applicants accepted, but only 56% of female applicants. − Discrimination?
Column percents
− 63% of people accepted were males. − 51% of people rejected were females.