![]() |
This tutorial was written as an introductory guide
to SPSS for social scientists and social science students, including scholars
performing quantitative research and undergraduates working on their senior
theses. A more general guide is
provided with the Windows version of SPSS.
(Once you have opened a data set, simply click on Help and then Tutorials,
and then the purple book with Tutorials
next to it; if you do not see the purple book, click on Contents on the top right.)
For more information on SPSS, you can also go to the SPSS
Homepage.
This tutorial will take you through the steps of
testing a simple research question: Do voters’ opinions on how the president is
handling the economy influence which party they will vote for in House
elections? If a voter believes that the
Democratic president is handling the economy poorly, for example, is she more
likely to vote for the Republican House candidate?
We can divide this research question into two
variables. The dependent variable, or the
variable we are trying to explain, is the vote in House elections. The independent variable, or the variable that is
supposed to be influencing the dependent variable, is voter opinion on how the president
is handling the economy. However, a
relationship between opinions on the economy and the vote might be
“spurious.” Maybe one’s party
identification (whether one calls oneself a Republican or Democrat) drives both
the vote in House elections and opinions on how the president is handling the
economy. In this case, party
identification is a control variable; it is another variable that
might be influencing this relationship that needs to be “held constant.”
We will test it by looking at the American National Election Survey of 1998. The American National Election Study (often known by the acronyms “ANES” or just “NES”) is a telephone survey of voting age Americans conducted every two years by the University of Michigan. NES data sets are widely used by political scientists for studying American elections—more often than Gallup, for example.
(The Harvard-MIT Data Center is located in the Government Department at Harvard University. Please check out our homepage.)
This tutorial is divided into a few sections. It begins with an introduction into why to use SPSS on page
3. Then, it outlines how to read
a data set and use it to produce basic statistics. It does so by looking at actual data—the American National
Election Survey of 1998—in an attempt to answer a real research question.
There are four main steps to manipulating data with
SPSS:
1.
Reading Data, or how to translate raw
data or data in another form into SPSS (page 6);
2.
Transforming
Data, or how to either create new variables or change the values of
existing variables (page 17);
3.
Defining
Variables, or how to put labels onto data so that people can understand it, and
how to structure data so that SPSS knows how to read it properly (page 27);
4.
Creating
Tables, (page 46.)
Most SPSS users prefer to use its Windows graphic interface, that is, pointing with the mouse and clicking on the options they want. At Harvard, those who want the greater control of typing in commands tend to use other statistical packages. Nonetheless, SPSS provides a way to not only type commands but also switch between this syntax editor (see page 4) and the Windows point-and-click method. While this tutorial will focus on the latter, the command code will be mentioned briefly as well.
SPSS is the statistical package most widely used by political scientists. There seem to be several reasons why:
1. Force of habit: SPSS has been around since the late 1960s. (Political scientist Norman Nie, who co-authored The Changing American Voter with Sidney Verba, developed it. “SPSS” originally stood for “Statistical Package for the Social Sciences,” but the name has since been changed to reflect the marketing of SPSS outside the academic community;)
2. Of the major packages, it seems to be the easiest to use for the most widely used statistical techniques;
3. One can use it with either a Windows point-and-click approach or through syntax (i.e., writing out of SPSS commands.) Each has its own advantages, and the user can switch between the approaches;
4. Many of the widely used social science data sets come with an easy method to translate them into SPSS; this significantly reduces the preliminary work needed to explore new data.
There are also two important limitations that
deserve mention at the outset:
1.
SPSS
users have less control over statistical output than, for example, Stata or
Gauss users. For novice users, this
hardly causes a problem. But, once a
researcher wants greater control over the equations or the output, she or he
will need to either choose another package or learn techniques for working
around SPSS’s limitations;
2.
SPSS
has problems with certain types of data manipulations, and it has some built in
quirks that seem to reflect its early creation. The best known limitation is its weak lag functions, that is,
how it transforms data across cases.
For new users working off of standard data sets, this is rarely a problem. But, once a researcher begins wanting to
significantly alter data sets, he or she will have to either learn a new
package or develop greater skills at manipulating SPSS.
Overall, SPSS is a good first statistical package
for people wanting to perform quantitative research in social science because
it is easy to use and because it can be a good starting point to learn more
advanced statistical packages.
Originally, SPSS was written like a programming language. Users wrote SPSS syntax (often on a mainframe computer and even with key-punch cards) that performed the tasks they wanted. In SPSS-Windows, users can still use syntax by using the syntax editor. They would
c)
Click
on Run and drag down to All.
(Alternatively, if they want to run only a few commands, they would
highlight those commands, click on Run,
and drag down to Selection.)
![]() |
We
will introduce the syntax for every step that we take in this tutorial, except
the last one. We have included the
complete syntax below as an example.
(Please notice that every command ends with a period.)
If
you want to learn more about SPSS syntax, your computer lab might have SPSS
syntax guides; try the base guide first.
You can also find online syntax guides by clicking on Help, dragging down to Syntax Guide, and choosing Base.
Finally, by clicking on Help,
dragging down to Topics, and then
choosing Index, you can find the
syntax for particular tasks.
Compute
clinteco = v980219.
Recode
v980336 (1=1)(2=3)(3=2)(4=7)(5=2)(8=8)(9=9) into party3.
Compute
hvote = v980313.
Variable label clinteco "Approval of
Clinton, Economy".
Value Labels clinteco
1 "Strongly Approve" 2
"Not Strongly Approve" 4
"Not Strongly Disapprove"
5 "Strongly
Disapprove" 8 "DK" 9 "NA”
0 "Inappropriate".
Missing Values clinteco (0,8,9).
Variable Label party3 "Party
Identification".
Value
Labels party3 1
"Democrat" 2 "Ind./ No Pref." 3
"Republican" 7 "Other Party"
8
"Don't Know" 9 "No Answer Given".
Missing Values party3 (7, 8, 9).
Variable Label hvote "Vote in House
Election".
Value Labels hvote
1
"Democratic Candidate"
2
"Republican Candidate"
3
"Third Party/Independent Cand."
7
"Name given not on candidate list"
8
"Don't Know or Refuse"
9
"No Answer Given"
0 "Inappropriate, Didn't Vote".
Missing Values hvote (7 thru 9, 0).
Frequencies vars = clinteco hvote party3.
Crosstab tables = hvote by clinteco
/
cells = count column.
Crosstab tables = hvote by clinteco by
party3
/ cells = count column.
Under most circumstances, a data set is not simply
handed to you. You would have to search
through an archive and then download the data that will most likely help you
complete your research project. The Harvard-MIT Data Center has such an
archive, but because of licensing agreements with other organizations, we can
give access to most data sets to only Harvard and MIT users. Therefore, we have separate directions on
how to download data for Harvard and MIT users (page 7) and for others (page 13).
(Special thanks to the Center for
Political Studies at the University of
Michigan, which has generously made the American National Election Study of
1998 publicly available, so that people outside of Harvard and MIT can also use
this tutorial.)*
Harvard-MIT
Users Others
* Sapiro, Virginia, Steven J. Rosenstone, and the National Election Studies. NATIONAL ELECTION STUDIES, 1998: POST-ELECTION STUDY [dataset]. Ann Arbor, MI: University of Michigan, Center for Political Studies [producer and distributor], 1999.
These
materials are based on work supported by the National Science Foundation under Grant Nos. : SBR-9707741,
SBR-9317631, SES-9209410, SES-9009379, SES-8808361, SES-8341310, SES-8207580,
and SOC77-08885.
Any
opinions, findings and conclusions or recommendations expressed in these
materials are those of the author(s) and do not necessarily reflect those of
the National Science Foundation.
Downloading Data:
Harvard-MIT Users:
Our goal is to download the “American National Election Study, 1998.” We can accomplish this in several easy steps:
1. Go to the Harvard-MIT Data Center home page;
2.
Double
click [Search Holdings] on the top left;
3. In the page that opens, there will be a search option in the top, left window under “Harvard-MIT Data Center Catalog.” (a) Type some part of the study title in the box after “Search for,” for example, “1998.” (b) Click “go”.

4.
In
the middle-left window, click the title of the study (“American National
Election Study, 1998: Post-Election Survey.”)
DO NOT choose the cumulative data file or the pilot study;
5. The study’s description will appear on the right. Click [Data];

6.
Scroll
down until you see “da2684_LREC.por” and
next to it “Subset/Crosstabs”. (“da” means that it is data; 8493 is the study number; LREC means the
length of each record, or line of data; and “por” means that it can be imported
into SPSS.) Click on “Subset.”
7.
(a)
Under “Choose an output format:”
choose “SPSS Portable File”. (b) Then
click [Create Subset]. (Do not worry
about the options of selecting cases or variables. It is usually better to download the entire data set. That way, if you later decide that you need
a variable that you hadn’t originally considered important, you won’t have to
download the data set a second time.)
8. After a few minutes, a dialogue box with the title “Unknown File Type” will appear. It will ask what you want to do with the file that you are downloading. Click [Save File].