Data Analysis with IBM SPSS Statistics
上QQ阅读APP看书,第一时间看更新

Demo - first look at the data - frequencies

The GSS 2014 data extract has 2,538 rows. You can inspect the data by visually scanning and scrolling through the data window, but the sample size makes it difficult to grasp all of the data at once. For example, how many unique values are there in a given variable? Do the values in a given variable occur with about the same frequency, or do certain values predominate? Running Frequencies on the data can serve as a useful first look because it produces summary tables that show all data values on the specified variables.

To run Frequencies from the menus, specify as follows:

Analyze | Descriptive Statistics | Frequencies

This opens the Frequencies dialog box.

Move all variables except ID from the left-hand side to the right-hand side variable list. Why leave out ID? This is because the resulting frequency table is quite lengthy.

Variables such as ID or INCOME measured in actual amounts can have many unique values. For this reason, you might choose to NOT display them in Frequencies as the resulting table can be very lengthy. 

Press Paste to paste the syntax to a syntax window. Here is the syntax:

FREQUENCIES VARIABLES=MARITAL AGE HAPPY SEX
/ORDER=ANALYSIS.

Mark and run the command.

To illustrate data inspection, consider the frequency table for MARITAL:

The frequency table shows all data codes that occur in the variable. For each code, the table shows the following points:

  • Frequency: The number of occurrences of the code
  • Percent: The percentage of cases having a particular value
  • Valid Percent: The percentage of cases having a particular value when only cases with non-missing values are considered
  • Cumulative Percent: The percentage of cases with non-missing data that have values less than or equal to a particular value

This data follows the survey research convention for categorical data--there are as many data codes as there are response categories, and there are also data codes to represent different types of non-responses. The data codes of 1 through 5 correspond to various marital statuses such as married, single, and so on, and the data code of 9 corresponds to responses in which marital status was not known.

It would be nice if the table showed the marital category names instead of or in addition to the data codes. It turns out that IBM SPSS Statistics gives us a way to do this.

In this instance, note that the Percent column and the Valid Percent column are identical because the code of 9 is treated the same as marital codes 1 through 5. The MISSING VALUES command gives us a way to declare the 9 data code as a missing value.

Finally, the Frequency column shows the category counts, which, in this instance, vary widely across the categories.