Harvard-MIT Data Center’s

Guide to SPSS
 

 

 

 

 

 


This tutorial was written as an introductory guide to SPSS for social scientists and social science students, including scholars performing quantitative research and undergraduates working on their senior theses.  A more general guide is provided with the Windows version of SPSS.  (Once you have opened a data set, simply click on Help and then Tutorials, and then the purple book with Tutorials next to it; if you do not see the purple book, click on Contents on the top right.)  For more information on SPSS, you can also go to the SPSS Homepage.

 

This tutorial will take you through the steps of testing a simple research question: Do voters’ opinions on how the president is handling the economy influence which party they will vote for in House elections?  If a voter believes that the Democratic president is handling the economy poorly, for example, is she more likely to vote for the Republican House candidate? 

 

We can divide this research question into two variables.  The dependent variable, or the variable we are trying to explain, is the vote in House elections.  The independent variable, or the variable that is supposed to be influencing the dependent variable, is voter opinion on how the president is handling the economy.  However, a relationship between opinions on the economy and the vote might be “spurious.”  Maybe one’s party identification (whether one calls oneself a Republican or Democrat) drives both the vote in House elections and opinions on how the president is handling the economy.  In this case, party identification is a control variable; it is another variable that might be influencing this relationship that needs to be “held constant.” 

 

We will test it by looking at the American National Election Survey of 1998.  The American National Election Study (often known by the acronyms “ANES” or just “NES”) is a telephone survey of voting age Americans conducted every two years by the University of Michigan.  NES data sets are widely used by political scientists for studying American elections—more often than Gallup, for example.  

 

(The Harvard-MIT Data Center is located in the Government Department at Harvard University.  Please check out our homepage.)

 

 

 

 

 


This tutorial is divided into a few sections.  It begins with an introduction into why to use SPSS  on page 3.  Then, it outlines how to read a data set and use it to produce basic statistics.  It does so by looking at actual data—the American National Election Survey of 1998—in an attempt to answer a real research question.

 

There are four main steps to manipulating data with SPSS:

1.      Reading Data, or how to translate raw data or data in another form into SPSS (page 6);

2.      Transforming Data, or how to either create new variables or change the values of existing variables (page 17);

3.      Defining Variables, or how to put labels onto data so that people can understand it, and how to structure data so that SPSS knows how to read it properly (page 27);

4.      Creating Tables, (page 46.)

 

Most SPSS users prefer to use its Windows graphic interface, that is, pointing with the mouse and clicking on the options they want.  At Harvard, those who want the greater control of typing in commands tend to use other statistical packages.  Nonetheless, SPSS provides a way to not only type commands but also switch between this syntax editor (see page 4) and the Windows point-and-click method.   While this tutorial will focus on the latter, the command code will be mentioned briefly as well.

 

 

 

 


why to use SPSS

Why Use SPSS? 

 

 

 

 

 


SPSS is the statistical package most widely used by political scientists. There seem to be several reasons why:

1.      Force of habit: SPSS has been around since the late 1960s.  (Political scientist Norman Nie, who co-authored The Changing American Voter with Sidney Verba, developed it.   “SPSS” originally stood for “Statistical Package for the Social Sciences,” but the name has since been changed to reflect the marketing of SPSS outside the academic community;)

2.      Of the major packages, it seems to be the easiest to use for the most widely used statistical techniques;

3.      One can use it with either a Windows point-and-click approach or through syntax (i.e., writing out of SPSS commands.)  Each has its own advantages, and the user can switch between the approaches;

4.      Many of the widely used social science data sets come with an easy method to translate them into SPSS; this significantly reduces the preliminary work needed to explore new data.

 

There are also two important limitations that deserve mention at the outset:

1.      SPSS users have less control over statistical output than, for example, Stata or Gauss users.  For novice users, this hardly causes a problem.  But, once a researcher wants greater control over the equations or the output, she or he will need to either choose another package or learn techniques for working around SPSS’s limitations;

2.      SPSS has problems with certain types of data manipulations, and it has some built in quirks that seem to reflect its early creation.   The best known limitation is its weak lag functions, that is, how it transforms data across cases.  For new users working off of standard data sets, this is rarely a problem.  But, once a researcher begins wanting to significantly alter data sets, he or she will have to either learn a new package or develop greater skills at manipulating SPSS.

 

Overall, SPSS is a good first statistical package for people wanting to perform quantitative research in social science because it is easy to use and because it can be a good starting point to learn more advanced statistical packages.

 

 

 

----------------------- 



0.syntax editor

Syntax Editor 

 

 

 

 

 

 


Originally, SPSS was written like a programming language.  Users wrote SPSS syntax (often on a mainframe computer and even with key-punch cards) that performed the tasks they wanted.  In SPSS-Windows, users can still use syntax by using the syntax editor.  They would

a)      Open the syntax window by clicking on File, dragging down to New, and choosing Syntax;

b)      Type the SPSS syntax that they want to run;

c)      Click on Run and drag down to All.  (Alternatively, if they want to run only a few commands, they would highlight those commands, click on Run, and drag down to Selection.)

 

 



We will introduce the syntax for every step that we take in this tutorial, except the last one.  We have included the complete syntax below as an example.  (Please notice that every command ends with a period.) 

 

If you want to learn more about SPSS syntax, your computer lab might have SPSS syntax guides; try the base guide first.  You can also find online syntax guides by clicking on Help, dragging down to Syntax Guide, and choosing Base.  Finally, by clicking on Help, dragging down to Topics, and then choosing Index, you can find the syntax for particular tasks.

 

 


Compute clinteco = v980219.

Recode v980336 (1=1)(2=3)(3=2)(4=7)(5=2)(8=8)(9=9) into party3.

Compute hvote = v980313.

 

Variable label clinteco "Approval of Clinton, Economy".

Value Labels clinteco

1 "Strongly Approve"  2 "Not Strongly Approve"  4 "Not Strongly Disapprove"

            5 "Strongly Disapprove"  8 "DK"  9 "NA”  0 "Inappropriate".

Missing Values clinteco (0,8,9).

 

Variable Label party3 "Party Identification".

Value Labels party3   1 "Democrat"  2  "Ind./ No Pref."  3  "Republican"  7  "Other Party"        

            8  "Don't Know"   9  "No Answer Given".                    

Missing Values party3 (7, 8, 9).

 

Variable Label hvote "Vote in House Election".

Value Labels hvote

            1  "Democratic Candidate"

            2  "Republican Candidate"

            3  "Third Party/Independent Cand."

            7  "Name given not on candidate list"

            8  "Don't Know or Refuse"

            9  "No Answer Given"

            0  "Inappropriate, Didn't Vote".

Missing Values hvote (7 thru 9, 0).

 

Frequencies vars = clinteco hvote party3.

 

Crosstab tables = hvote by clinteco

   / cells = count column.

Crosstab tables = hvote by clinteco by party3

   / cells = count column.

 

 


0.Reading Data

 

Step 1a: Downloading the Data 

 

 

 

 

 

 

 


Under most circumstances, a data set is not simply handed to you.  You would have to search through an archive and then download the data that will most likely help you complete your research project.  The Harvard-MIT Data Center has such an archive, but because of licensing agreements with other organizations, we can give access to most data sets to only Harvard and MIT users.  Therefore, we have separate directions on how to download data for Harvard and MIT users (page 7) and for others (page 13).  (Special thanks to the Center for Political Studies at the University of Michigan, which has generously made the American National Election Study of 1998 publicly available, so that people outside of Harvard and MIT can also use this tutorial.)*

 

 

Harvard-MIT Users

 

Others

 
 

 

 

 

 

 

 

 

 

 

 

 

 

 


* Sapiro, Virginia, Steven J. Rosenstone, and the National Election Studies. NATIONAL ELECTION STUDIES, 1998: POST-ELECTION STUDY [dataset]. Ann Arbor, MI: University of Michigan, Center for Political Studies [producer and distributor], 1999.

 

These materials are based on work supported by the National Science Foundation under Grant Nos. : SBR-9707741, SBR-9317631, SES-9209410, SES-9009379, SES-8808361, SES-8341310, SES-8207580, and SOC77-08885.

 

Any opinions, findings and conclusions or recommendations expressed in these materials are those of the author(s) and do not necessarily reflect those of the National Science Foundation.


Downloading Data: Harvard-MIT Users:

 

Our goal is to download the “American National Election Study, 1998.”   We can accomplish this in several easy steps:

1.      Go to the Harvard-MIT Data Center home page; 

2.      Double click [Search Holdings] on the top left;

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

3.      In the page that opens, there will be a search option in the top, left window under “Harvard-MIT Data Center Catalog.”  (a) Type some part of the study title in the box after “Search for,” for example, “1998.”  (b) Click “go”.

 

 

 



4.      In the middle-left window, click the title of the study (“American National Election Study, 1998: Post-Election Survey.”)  DO NOT choose the cumulative data file or the pilot study;

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

5.      The study’s description will appear on the right.  Click [Data];

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 


 


6.      Scroll down until you see “da2684_LREC.por” and next to it “Subset/Crosstabs”.  (“da” means that it is data;  8493 is the study number; LREC means the length of each record, or line of data; and “por” means that it can be imported into SPSS.)  Click on “Subset.”

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

7.      (a) Under “Choose an output format:” choose “SPSS Portable File”.  (b) Then click [Create Subset].  (Do not worry about the options of selecting cases or variables.  It is usually better to download the entire data set.  That way, if you later decide that you need a variable that you hadn’t originally considered important, you won’t have to download the data set a second time.)

 

 

 


 

8.      After a few minutes, a dialogue box with the title “Unknown File Type” will appear.  It will ask what you want to do with the file that you are downloading.  Click [Save File].