Basic Social Statistics-Lecture02A-Sociology, Study notes of Social Statistics and Data Analysis

Would you say that astrology is very scientific, sort of scientific, or not at all scientific? Variables, Frequency Distributions, Discrete Variables, Continuous variable, Relative Frequencies, Grouped Distributions, Cumulative Distributions, Cumulative frequency, Basic Social Statistics, Lecture Slides, Sociology, David Knoke, Minnesota State University (MN), United States of America (USA)

Typology: Study notes

2011/2012

Uploaded on 01/23/2012

desmond
desmond 🇺🇸

4.8

(12)

327 documents

1 / 18

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Chapter 2
Describing Variables
2.1 Frequency Distributions for Discrete and
Continuous Variables
2.2 Grouped and Cumulative Distributions
2.3 Graphing Frequency Distributions
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12

Partial preview of the text

Download Basic Social Statistics-Lecture02A-Sociology and more Study notes Social Statistics and Data Analysis in PDF only on Docsity!

Chapter 2

Describing Variables

2.1 Frequency Distributions for Discrete and

Continuous Variables

2.2 Grouped and Cumulative Distributions

2.3 Graphing Frequency Distributions

Frequency Distributions

Frequency distribution : a table of outcomes (response categories) of a variable and the number of times [tally or

count] each outcome is observed.

A frequency distribution shows the total number of persons responding to each of the variable‟s K categories.

  • Tally (count) frequencies by hand or by calculator; or
  • Use SPSS on GSS to tally frequencies & a print table

Relative f.d. (= proportion): divide tally by total N of cases Percentage f.d. shows proportions multiplied by 100% Sum of all the percents = 100.0%

Calculating Relative Frequencies

Should you include or exclude cases with missing values when calculating a relative frequency distribution?

  • SPSS “Percent” column includes all cases
  • SPSS “Valid Percent” excludes any “Missing” [0 = IAP; 8 = DK; 9 = NA]

For a variable with K categories, the valid N is the sum of the frequencies, fi, across all K categories (where the subscript i indicates changing index values, from 1 to k) :

f 1  f 2  f 3 ... fk  N

N

f p

i

i 

For ASTROSCI (exclude all Missing categories): N = 74 + 434 + 935 =

To find the proportion (relative frequency) in the i th category i, just divide fi by valid N:

p 1 = 74 / 1443 = p 2 = 434 / 1443 = p 3 = 935 / 1443 =

N = 1443 / 1443 =

______

______

______

______

_______

Usually no more than four “significant digits” will be needed when calculating proportions; use rounding.

Grouped Distributions

Grouped data : continuous measures that have been collapsed into fewer categories Measurement interval treats all cases that fall between the lower and upper limits as equal values

  • SSDA : “Generally, between 6 and 20 intervals should be used…”

Fewer than 10 intervals are preferable for simplicity

  • Use SPSS RECODE to group adjacent categories together
  • Label new category by the lower & upper limits of that interval

Use mutually exclusive & exhaustive limits:

  • Each case falls into only one interval
  • Every case is assigned somewhere

COMPUTE age10 = age.

RECODE age10 (18 thru 19=1) (20 thru 29=2) (30 thru 39=3) (40 thru 49=4) (50 thru 59=5) (60 thru 69=6) (70 thru 79=7) (80 thru 89=8) (ELSE=SYSMIS).

VARIABLE LABELS age10 „AGE IN DECADES'.

VALUE LABELS age10 1 '18-19' 2 '20-29' 3 '30-39' 4 '40-49' 5 '50-59' 6 '60-69' 7 '70-79' 8 '80-89'.

FREQUENCIES VARIABLES = age age.

AGE in the 2008 GSS

Respondent‟s AGE is coded in years, 72 categories from 18 to 89 (and 10 cases with missing data, coded = 99).

Let‟s use these SPSS commands to collapse AGE into eight

decades, by creating a new variable called AGE10 :

AGE10 Age in Decades

Which decade(s) has the most cases? ___________

Which has the largest percentage? ______________

Frequency Percent Valid Percent

Cumulative Percent Valid 1.00 18-19 37 1.8 1.8 1. 2.00 20-29 322 15.9 16.0 17. 3.00 30-39 373 18.4 18.5 36. 4.00 40-49 381 18.8 18.9 55. 5.00 50-59 371 18.3 18.4 73. 6.00 60-69 272 13.4 13.5 87. 7.00 70-79 165 8.2 8.2 95. 8.00 80-89 92 4.5 4.6 100. Total 2013 99.5 100. Missing System 10. Total 2023 100.

Another Type of Grouped Data

Ordered frequency distributions may be tabled without collapsed any categories. Although each score doesn‟t involve a range from lower to upper limits, I also refer to such tabular displays as “grouped data” because each category represents numerous respondents:

NEWS HOW OFTEN DOES R READ NEWSPAPER (^) Frequency Valid % 1 EVERYDAY 431 30. 2 FEW TIMES A WEEK 300 21. 3 ONCE A WEEK 297 20. 4 LESS THAN ONCE WK 200 14. 5 NEVER 191 13. Total 1419 100.

Note the poor GSS practice of assigning higher numbers to lower-level activity! You should recode to reverse their order.

Graphing Frequency Distributions

A Graph or Diagram visually summarizes the numbers in a frequency distribution or other table.

Three basic types of graphs: BAR CHART for nonordered discrete variables HISTOGRAM for ordered discrete variables

POLYGON for continuous variables

On the following slides, how do bar charts and histograms differ in the spaces between their bars? Why? How does a histogram differ from a polygon?

NEW ENGLANDMIDDLE ATLANTICE. NOR. CENTRALW. NOR. CENTRALSOUTH ATLANTICE. SOU. CENTRALW. SOU. CENTRALMOUNTAINPACIFIC

REGION OF INTERVIEW

0

100

200

300

400

500

Count

Bar Chart of REGION

Polygon of AGE

18-19 20-29 30-39 40-49 50-59 60-69 70-79 80- AGE IN DECADES

0

100

200

300

400

Count

Variations on Basic Graphs

Two histograms: Age pyramids by sex

Two polygons: Approval over time

12 bar charts: Opinion by nation