




























































































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Statistical analysis of various datasets using normal distributions and quantiles. It includes histograms, time plots, and calculations of standard deviations and z-scores for infant mortality rates, male theft conviction rates, and divorce rates. Additionally, it covers the comparison of normal distributions in the context of cholesterol levels and public debates.
Typology: Slides
1 / 102
This page cannot be seen from the preview
Don't miss anything!





























































































Module 1 – Looking at Data: Distributions
François Nielsen
University of North Carolina Chapel Hill
Fall 2009
(^1) Adapted in part from slides from courses by Robert Andersen (University of
Toronto) and by John Fox (McMaster University)
What is Statistics? The Challenger Disaster
… (^) Statistics may be defined as the science of learning from data (IPS6e) … (^) The Challenger Accident was a tragic example of the consequences of poor statistical analysis.^2 … (^) On 28 January 1986 The U.S. pace shuttle Challenger exploded shortly after blastoff, killing the seven astronauts. … (^) The cause of the explosion was the failure of rubber O-rings sealing two sections of one of the booster rockets attached to the shuttle. … (^) This failure, in turn, was caused by the low temperature at the time of launch which made the O-rings lose their elasticity.
(^2) Edward Tufte. Visual Explanations.
What is Statistics? The Challenger Disaster
… (^) On the day before launch, engineers at Morton Thiokol, the company that built the boosters, recommended that the launch be postponed because of the low forecast temperature for the following day. Officials at NASA and Thiokol examined data on O-ring damage that had occurred on previous launches. … (^) Engineers plotted a measure of damage to O-rings against temperature at launch time, including only launches with non-negligible damage. … (^) The plot showed no association of damage with temperature. … (^) Had they included all the cases they would have seen a clear association: lower temperature → greater damage and would have postponed the launch, avoiding the accident.
Data Sets
… (^) A data set is a collection of facts assembled for a particular purpose. … (^) We will mainly use rectangular data sets where information is organized in an individual (row) by variable (column) format … (^) an individual is a unit of observation – e.g. a person, an organization, a country … (^) a variable is a characteristic of the individual – e.g. a depression score, score on a scale of centralization of decision-making, Gross Domestic Product per capita … (^) a case is the information on all variables for one individual (corresponding to one row of the data set) … (^) an observation is the value of a single variable for a given individual
Levels of Measurement
… (^) The level of measurement determines the kinds of analysis that can be carried out with a variable … (^) In practice one can simplify the four-fold typology into two categories: … (^) Qualitative variables: … (^) Includes categorical variables + ordinal variables treated as categorical – e.g. age in years recoded into YOUNG, ADULT, SENIOR categories … (^) Analyzed using contingency tables (tabular analysis) … (^) Quantitative variables: … (^) Includes interval variables + ratio variables + ordinal variables treated as interval variables – e.g. “How well do you speak Spanish?” coded from 1 to 5 … (^) Analyzed using scatterplots & regression analysis … (^) There are advanced analytical techniques for ordinal data that are beyond the scope of this class
Three Central Aspects to Statistics
There are three central aspects (tasks) of statistics:
Counts and Percentages
… (^) The example below shows voting intentions for the 1988 Chilean Plebiscite from a survey conducted by FLACSO/Chile;
… (^) Respondents were asked whether they intended to support Pinochet
Intended vote Count Percent A (Abstain) 187 6. N (Vote ‘No’ against Pinochet) 889 32. U (Undecided) 588 21. Y (Vote ‘Yes’ for Pinochet) 868 32. Total 2700 100
Percent for ‘Yes’ vote = 100 ×
Simple Tabulation in Stata
. * use the File menu to find the Chile.dta file . use "D:\soci708\data\data_from_car_Stata\Chile.dta", clear . tab vote
vote | Freq. Percent Cum. ------------+----------------------------------- A | 187 6.93 6. N | 889 32.93 39. NA | 168 6.22 46. U | 588 21.78 67. Y | 868 32.15 100. ------------+----------------------------------- Total | 2,700 100.
Bar Chart
… (^) The distribution of a categorical variable can also be represented as a bar graph or a pie chart. … (^) In R, a bar graph is created simply by
… (^) Note that in a bar chart the bars do not touch each other.
A N U Y
0
200
400
600
800
Pie Chart
function
A N NA U Y
… (^) A histogram is a bar graph that shows the count or percentage of cases falling in each of the bins … (^) Horizontal axis: The range of the variable … (^) Vertical axis: The count or percent of cases in each bin … (^) Histograms are easily made:
Infant mortality example using the Leinhardt data
… (^) Because it takes on so many values and there are many cases, we must construct a histogram to view the distribution of
income infant region oil Australia 3426 26.7 Asia no Austria 3350 23.7 Europe no Belgium 3346 17.0 Europe no Canada 4751 16.8 Americas no ... Upper.Volta 82 180.0 Africa no Southern.Yemen 96 80.0 Asia no Yemen 77 50.0 Asia no Zaire 118 104.0 Africa no
Histogram in R
> # In R > data(Leinhardt) > attach(Leinhardt) > hist(infant, nclass=14, main="Distribution of Infant Mortality Rates", xlab="Infant Mortality Rate", ylab="Count", col="red")
Distribution of Infant Mortality Rates
Infant Mortality Rate
Count
0 100 300 500
0
10
20
30
40
Examining a Histogram