Statistical Analysis: Mortality Rates & Cholesterol - Normal Distributions & Quantiles, Slides of Statistics

Statistical analysis of various datasets using normal distributions and quantiles. It includes histograms, time plots, and calculations of standard deviations and z-scores for infant mortality rates, male theft conviction rates, and divorce rates. Additionally, it covers the comparison of normal distributions in the context of cholesterol levels and public debates.

Typology: Slides

2011/2012

Uploaded on 12/29/2012

sankait
sankait 🇮🇳

4.2

(13)

113 documents

1 / 102

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Soci708 Statistics for Sociologists1
Module 1 Looking at Data: Distributions
François Nielsen
University of North Carolina
Chapel Hill
Fall 2009
1Adapted in part from slides from courses by Robert Andersen (University of
Toronto) and by John Fox (McMaster University)
1/102
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30
pf31
pf32
pf33
pf34
pf35
pf36
pf37
pf38
pf39
pf3a
pf3b
pf3c
pf3d
pf3e
pf3f
pf40
pf41
pf42
pf43
pf44
pf45
pf46
pf47
pf48
pf49
pf4a
pf4b
pf4c
pf4d
pf4e
pf4f
pf50
pf51
pf52
pf53
pf54
pf55
pf56
pf57
pf58
pf59
pf5a
pf5b
pf5c
pf5d
pf5e
pf5f
pf60
pf61
pf62
pf63
pf64

Partial preview of the text

Download Statistical Analysis: Mortality Rates & Cholesterol - Normal Distributions & Quantiles and more Slides Statistics in PDF only on Docsity!

Soci708 – Statistics for Sociologists^1

Module 1 – Looking at Data: Distributions

François Nielsen

University of North Carolina Chapel Hill

Fall 2009

(^1) Adapted in part from slides from courses by Robert Andersen (University of

Toronto) and by John Fox (McMaster University)

Introduction

What is Statistics? The Challenger Disaster

… (^) Statistics may be defined as the science of learning from data (IPS6e) … (^) The Challenger Accident was a tragic example of the consequences of poor statistical analysis.^2 … (^) On 28 January 1986 The U.S. pace shuttle Challenger exploded shortly after blastoff, killing the seven astronauts. … (^) The cause of the explosion was the failure of rubber O-rings sealing two sections of one of the booster rockets attached to the shuttle. … (^) This failure, in turn, was caused by the low temperature at the time of launch which made the O-rings lose their elasticity.

(^2) Edward Tufte. Visual Explanations.

Introduction

What is Statistics? The Challenger Disaster

… (^) On the day before launch, engineers at Morton Thiokol, the company that built the boosters, recommended that the launch be postponed because of the low forecast temperature for the following day. Officials at NASA and Thiokol examined data on O-ring damage that had occurred on previous launches. … (^) Engineers plotted a measure of damage to O-rings against temperature at launch time, including only launches with non-negligible damage. … (^) The plot showed no association of damage with temperature. … (^) Had they included all the cases they would have seen a clear association: lower temperature → greater damage and would have postponed the launch, avoiding the accident.

Introduction

Data Sets

… (^) A data set is a collection of facts assembled for a particular purpose. … (^) We will mainly use rectangular data sets where information is organized in an individual (row) by variable (column) format … (^) an individual is a unit of observation – e.g. a person, an organization, a country … (^) a variable is a characteristic of the individual – e.g. a depression score, score on a scale of centralization of decision-making, Gross Domestic Product per capita … (^) a case is the information on all variables for one individual (corresponding to one row of the data set) … (^) an observation is the value of a single variable for a given individual

Introduction

Levels of Measurement

… (^) The level of measurement determines the kinds of analysis that can be carried out with a variable … (^) In practice one can simplify the four-fold typology into two categories: … (^) Qualitative variables: … (^) Includes categorical variables + ordinal variables treated as categorical – e.g. age in years recoded into YOUNG, ADULT, SENIOR categories … (^) Analyzed using contingency tables (tabular analysis) … (^) Quantitative variables: … (^) Includes interval variables + ratio variables + ordinal variables treated as interval variables – e.g. “How well do you speak Spanish?” coded from 1 to 5 … (^) Analyzed using scatterplots & regression analysis … (^) There are advanced analytical techniques for ordinal data that are beyond the scope of this class

Introduction

Three Central Aspects to Statistics

There are three central aspects (tasks) of statistics:

  1. Data Production … (^) Designing research (e.g., a survey or an experiment) so that it produces data that help answer important questions … (^) These issues will be topic of Module 3 – Producing Data
  2. Data Analysis … (^) Describing data with graphs and numerical summaries … (^) Displaying patterns and trends … (^) Measuring differences
  3. Statistical Inference … (^) Using information about a sample of individuals, drawn at random from a larger population, to establish conclusions about characteristics of the population

Displaying Qualitative Data

Counts and Percentages

… (^) The example below shows voting intentions for the 1988 Chilean Plebiscite from a survey conducted by FLACSO/Chile;

data are from a dataset called Chile in the car package for R

… (^) Respondents were asked whether they intended to support Pinochet

Intended vote Count Percent A (Abstain) 187 6. N (Vote ‘No’ against Pinochet) 889 32. U (Undecided) 588 21. Y (Vote ‘Yes’ for Pinochet) 868 32. Total 2700 100

Percent for ‘Yes’ vote = 100 ×

Displaying Qualitative Data

Simple Tabulation in Stata

. * use the File menu to find the Chile.dta file . use "D:\soci708\data\data_from_car_Stata\Chile.dta", clear . tab vote

vote | Freq. Percent Cum. ------------+----------------------------------- A | 187 6.93 6. N | 889 32.93 39. NA | 168 6.22 46. U | 588 21.78 67. Y | 868 32.15 100. ------------+----------------------------------- Total | 2,700 100.

Displaying Qualitative Data

Bar Chart

… (^) The distribution of a categorical variable can also be represented as a bar graph or a pie chart. … (^) In R, a bar graph is created simply by

the plot function

> plot(vote)

… (^) Note that in a bar chart the bars do not touch each other.

A N U Y

0

200

400

600

800

Displaying Qualitative Data

Pie Chart

… In Stata, it is simple to create a pie chart with the graph pie

function

. graph pie, over(vote)

A N NA U Y

Histograms

… (^) A histogram is a bar graph that shows the count or percentage of cases falling in each of the bins … (^) Horizontal axis: The range of the variable … (^) Vertical axis: The count or percent of cases in each bin … (^) Histograms are easily made:

  1. Divide the range of the variable into intervals of equal width (called bins) … (^) Each case must fit into only one bin
  2. Count the number of individuals falling in each bin and draw a bar representing them … (^) Unlike with a bar graph, the bars of a histogram touch each other – i.e., there are no spaces between them – reflecting the fact that we are displaying information about a quantitative variable

Histograms

Infant mortality example using the Leinhardt data

… Below are a few cases from the Leinhardt dataset in the car

package for R (there are 105 cases in total)

… (^) Because it takes on so many values and there are many cases, we must construct a histogram to view the distribution of

infant mortality (infant)

income infant region oil Australia 3426 26.7 Asia no Austria 3350 23.7 Europe no Belgium 3346 17.0 Europe no Canada 4751 16.8 Americas no ... Upper.Volta 82 180.0 Africa no Southern.Yemen 96 80.0 Asia no Yemen 77 50.0 Asia no Zaire 118 104.0 Africa no

Histograms

Histogram in R

> # In R > data(Leinhardt) > attach(Leinhardt) > hist(infant, nclass=14, main="Distribution of Infant Mortality Rates", xlab="Infant Mortality Rate", ylab="Count", col="red")

Distribution of Infant Mortality Rates

Infant Mortality Rate

Count

0 100 300 500

0

10

20

30

40

Histograms

Examining a Histogram

  1. Look for the overall pattern of the data described by the shape, center, and spread of the distribution: … (^) In a symmetric distribution … (^) Most cases are near the center … (^) Half the cases will be on one side, the other side will be its mirror image … (^) In a skewed distribution one tail of the distribution is longer than the other: … (^) Positive skew: the tail to the right is longer … (^) Negative skew: the tail to the left is longer
  2. Look for departures from the overall pattern, such as outliers: … (^) individual values that fall outside the general pattern of the data