SPSS Syntax for Descriptive Statistics: A Handout, Lecture notes of History

A step-by-step guide on how to use spss software for descriptive statistics, including opening data files, selecting cases, creating graphical and numerical descriptives, computing logarithms, and saving standardized variables. It covers various types of data, such as categorical and scale variables, and demonstrates the use of commands like get stata file, select if, graph, compute, frequencies, correlations, and regressions.

Typology: Lecture notes

2021/2022

Uploaded on 07/04/2022

Bjarne_90
Bjarne_90 🇳🇴

4.9

(8)

337 documents

1 / 4

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
SPSS syntax for descriptive statistics
Johan A. Elkink
February 28, 2013
1 Opening data
First, you need to open the data file. This might be an SPSS file with all the data properly
defined, or it can be a file in another format that requires a bit more work. Most example
data in this course is saved in Stata’s .dta format, which is useful because it can be easily
opened in a variety of statistics packages, but it does require a bit more manipulation to be
fully usable.
To open a Stata file in SPSS, you can use:
GET STATA FILE = ’asiabaro.dta’.
Often you will need to specify the location of the file, for example:
GET STATA FILE = ’c:\Users\11020101\Downloads\asiabaro.dta’.
Or you can use the menus by using “File”, then “Open”, then select .dta files, then browse
to the file you need, and then click “Paste”.
After that, you select this command, and run it using the green arrow button. This should
open the data file.
1.1 Selecting cases
For the example below on democratic and development, we want to select only one year from
the sample, e.g. 1990.1You can use the following code here:
GET STATA FILE = ’demdev.dta’.
SELECT IF (year = 1990).
2 Univariate graphical descriptives
Graphs in SPSS can be generated in two different ways: using what is called the “legacy
dialogues”, which is the menus that were common in older versions of SPSS, or the “Graph
1In the example we look at the variable laggdppc. What this measures is the Gross Domestic Product
per Capita, lagged by one year, i.e. the observations in the data set for 1990 are really the GDPpc levels of
1989. This handout ignores that detail.
1
pf3
pf4

Partial preview of the text

Download SPSS Syntax for Descriptive Statistics: A Handout and more Lecture notes History in PDF only on Docsity!

SPSS syntax for descriptive statistics

Johan A. Elkink

February 28, 2013

1 Opening data

First, you need to open the data file. This might be an SPSS file with all the data properly defined, or it can be a file in another format that requires a bit more work. Most example data in this course is saved in Stata’s .dta format, which is useful because it can be easily opened in a variety of statistics packages, but it does require a bit more manipulation to be fully usable. To open a Stata file in SPSS, you can use:

GET STATA FILE = ’asiabaro.dta’.

Often you will need to specify the location of the file, for example:

GET STATA FILE = ’c:\Users\11020101\Downloads\asiabaro.dta’.

Or you can use the menus by using “File”, then “Open”, then select .dta files, then browse to the file you need, and then click “Paste”. After that, you select this command, and run it using the green arrow button. This should open the data file.

1.1 Selecting cases

For the example below on democratic and development, we want to select only one year from the sample, e.g. 1990.^1 You can use the following code here:

GET STATA FILE = ’demdev.dta’. SELECT IF (year = 1990).

2 Univariate graphical descriptives

Graphs in SPSS can be generated in two different ways: using what is called the “legacy dialogues”, which is the menus that were common in older versions of SPSS, or the “Graph

(^1) In the example we look at the variable laggdppc. What this measures is the Gross Domestic Product per Capita, lagged by one year, i.e. the observations in the data set for 1990 are really the GDPpc levels of

  1. This handout ignores that detail.

Builder”. The latter is more flexible, but the syntax code it generates is much more compli- cated. In this handout, I will use the old-fashioned code, but you are free to use either. In the Graph Builder, it is primarily a matter of dragging plot types and variables to the center screen and then click “Paste” to generate the syntax code. For a pie chart, you can use:

GRAPH /PIE = religion.

This will create a pie chart of the religion variable in the asiabaro.dta data set, which represents religious denomination, a nominal variable. If you get problems because the variable is not properly defined as nominal, you can use:

VARIABLE LEVEL religion (NOMINAL).

For the bar plot, you can use:

GRAPH /BAR = religion.

For scale level variables (interval or ratio measurement level), the histogram is more useful, for example using GDP per capita in the demdev.dta data set:

GRAPH /HISTOGRAM = laggdppc.

Or you can use a box plot:

EXAMINE laggdppc /PLOT boxplot.

Computing logarithms

For “money variables”, such as GDP per capita, it is often useful to look at the logarithmic transformation. This transformation “stretches” the range of lower values and “squeezes” the range of higher values, such that the distribution of a variable that is very skewed with a long tail on the higher values, will approximate more closely a normal distribution or bell curve. You can calculate such variable and reproduce the plots with:

COMPUTE logGDPpc = ln(laggdppc). GRAPH /HISTOGRAM = logGDPpc. EXAMINE logGDPpc /PLOT boxplot.

3 Univariate numerical descriptives

The frequency table, which just lists the number of occurrances of a particular category for a categorical variable, can be generated with:

FREQUENCES VARIABLES = polity2.

This is always a good starting point to look at your data, but for scale variables, the table will be unwieldy. To calculate the mode, median, and mean, you can use the following code:

Two categorical variables

Here a cross-table is the most common visualisation of the relationship, whereby a key decision is the correct calculation of the percentages. They should be calculated over the categories for the independent variable, so that you can compare across the categories of the dependent variable. Example code would be:

CROSSTABS /TABLES = cwar BY democracy /CELLS = COUNT COLUMN.

or

CROSSTABS /TABLES = cwar BY democracy /CELLS = COUNT ROW.

Recoding a variable

Often, you might want to recode a variable into fewer or different categories. Here is the example as used in the exercise in the slides:

RECODE polity (MISSING=SYSMIS) (Lowest thru -7=1) (-6 thru 6=2) (7 thru Highest=3) INTO regime.

Note that you should always also add proper labels for both the variable and the respective values, and set the right level of measurement:

VARIABLE LABELS regime "Political regime classification". VALUE LABELS regime 1 "Autocracy" 2 "Anocracy" 3 "Democracy". VARIABLE LEVEL regime (ORDINAL).

And finally, check whether it worked the way you expect:

CROSSTABS /TABLES = polity2 BY regime /CELLS = COUNT.

Saving a standardized variable

z-scores of a variable can be saved using:

DESCRIPTIVES VARIABLES = polity2 laggdppc /SAVE.

This will create two new variables, Zpolity2 and Zlaggdppc.