R function that recodes GSS religion from three variables (relig, denom, other) to 12 categories, according to Darren E. Sherkat and Derek Lehman, “After The Resurrection: The Field of the Sociology of Religion in the United States”.1


UPDATE

Function is turned into package resurrectionr resurrectionr package logo

The package is not on CRAN yet. It is located on github.com/mdjeric/resurrectionr, and can be installed using devtools::install_github("mdjeric/resurrectionr"). There is also a dedicated website for it mdjeric.github.io/resurrectionr. It is a considerable step forward from this function and includes several options, accepts different types of variables, etc.


Purpose

Main goal is to address four problems:

  1. Recode religion successfully.
  2. Easily (automatically) recode whole family of religion variables in GSS (religion, religion at 16, parents’ religion, and special modules on friends’ religion).
  3. Recode no matter how GSS data is imported to R.
  4. Provide two ways of grouping religion (12 and 7 groups).

At this point, only 1 works well, while 2 and 3 are partial. Beside huge number of categories (cca 200 for other), main problem is that punch codes from GSS codebook do not correspond to factor values once data is imported (some codes and blocks of numbers are just skipped).

Attention!

Function is usable but with certain limitations. At this moment it successfully works only for respondent’s religion and religion at the age of 16 (although the process can be used on other derived variables), and when data is imported from SPSS file to data frame, with trimmed factor names and values, without use of missing values.

If this does not suit your need, ad hoc workaround is to import GSS dataset in your fashion of choice (e.g. with use of NA) and identical dataset as required by this function. You should imediatlly use the function on the second dataframe, and assing values of new variable to the first. Ideally, you would also assing the variables used for recoding, and check them against main dataframe that everything is OK.

Explanation of how it works

There is probably an easier way to do this, but I started using imported GSS data from .sav files with categorical variables as factors using foreign package. However, R’s nominal values of factors do not correspond to SPSS codes for factors (i.e. corresponding punches in GSS codebook) which causes problem in trying to directly replicate SPSS syntax.

Here we can see the sample from 2014 GSS file:

library(foreign)

GSS <- read.spss("_GSS2014.sav",
                 to.data.frame = TRUE,
                 trim.factor.names = TRUE,
                 trim_values = TRUE,
                 use.missings = FALSE
                 )

summary(GSS[, c("relig", "denom", "other")])
##         relig                   denom                      other     
##  PROTESTANT:1125   IAP             :1264   IAP                :2269  
##  CATHOLIC  : 606   NO DENOMINATION : 298   Pentecostal        :  51  
##  NONE      : 522   OTHER           : 254   Mormon             :  32  
##  CHRISTIAN : 134   BAPTIST-DK WHICH: 151   Jehovah's Witnesses:  22  
##  JEWISH    :  40   SOUTHERN BAPTIST: 146   Church of Christ   :  17  
##  OTHER     :  27   UNITED METHODIST: 110   NA                 :  16  
##  (Other)   :  84   (Other)         : 315   (Other)            : 131
str(GSS[, c("relig", "denom", "other")])
## 'data.frame':    2538 obs. of  3 variables:
##  $ relig: Factor w/ 16 levels "IAP","PROTESTANT",..: 3 3 2 3 3 2 3 3 3 3 ...
##  $ denom: Factor w/ 30 levels "IAP","AM BAPTIST ASSO",..: 1 1 16 1 1 26 1 1 1 1 ...
##  $ other: Factor w/ 202 levels "IAP","Hungarian Reformed",..: 1 1 1 1 1 1 1 1 1 1 ...
tail(GSS[, c("relig", "denom", "other")])
##           relig            denom                          other
## 2533 PROTESTANT BAPTIST-DK WHICH                            IAP
## 2534 PROTESTANT UNITED METHODIST                            IAP
## 2535       NONE              IAP                            IAP
## 2536       NONE              IAP                            IAP
## 2537   CATHOLIC              IAP                            IAP
## 2538 PROTESTANT            OTHER Congregationalist, 1st Congreg

As a starting point for a more universal recoding function, this one uses three tables (in religion_punches folder, heads presented here) from which three vectors are formed (relig, denom, and other) where index corresponds to punch number, and label is value.

## $relig.csv
##   punch      label
## 1     0        IAP
## 2     1 PROTESTANT
## 3     2   CATHOLIC
## 4     3     JEWISH
## 5     4       NONE
## 6     5      OTHER
## 
## $denom.csv
##   punch               label
## 1     0                 IAP
## 2    10     AM BAPTIST ASSO
## 3    11   AM BAPT CH IN USA
## 4    12 NAT BAPT CONV OF AM
## 5    13   NAT BAPT CONV USA
## 6    14    SOUTHERN BAPTIST
## 
## $punch.csv
##   punch                              label
## 1     0                                IAP
## 2     1                 Hungarian Reformed
## 3     2         Evangelical Congregational
## 4     3 Ind Bible, Bible, Bible Fellowship
## 5     5                 Church of Prophecy
## 6     6            New Testament Christian

For each new religious identification, separate vectors (for relig, denom, and/or other) are created that contain punch codes and labels, per paper and SPSS syntax. In addition, logical vectors are created for Sectarian Protestants with valid denomination codes for sorting between ‘Sectarian Protestants’, ‘Christian - no group given’, and ‘other religions’.

This approach is somewhat superior to usual practices of recoding, since it can be confirmed through basic set operations that there is no overlap between groups or that some categories are left out.

Next, twelve logical vectors, for respondent’s belonging to each group, are created by checking against appropriate vectors containing labels. Names are assigned, function prints frequency table, and returns vector that can be assigned to a new variable.

This allows that basic structure can be generalized for recoding of all three-variable sets of religious identification that can be imported as either factors with names or numerical values (or combination, as is the case with some groups), easy recoding into different number of groups, and printing of detailed classification that includes names of all religions and denominations.

Code

rec_relig_12 <- function(religion, denomination, other)  {
  ######## RECODING RELIGION AND RELIGION AT 16 ###########
  # Sherkat and Lehman (2017)
  # To work properly, folder 'religion_punches' with .csvs
  # of label names has to be in wokring directory.
  #
  # Function: # rec_relig_12(religion, denomination, other)
  # relig or relig16 variable; denom or denom 16; other or oth16
  # function prints frequencies and returns factor vector with
  # religion recoded.
  # 
  # It works with GSS dataset imported through 'read.spss',
  # from foreign package, in following way:
  # to.data.frame = TRUE, trim.factor.names = TRUE,
  # trim_values = TRUE, use.missings = FALSE
  
  # Import three varaibles into new dataset used for recoding
  DF <- data.frame(relig = religion,
                   denom = denomination,
                   other = other
                   )
  
  # Read values for all variables
  c_relig <- read.csv("religion_punches/relig.csv")
  c_denom <- read.csv("religion_punches/denom.csv")
  c_other <- read.csv("religion_punches/other.csv")
  
  # Create vectors with position corespondign to the punch
  # of label in DF codebook for 3 variables 
  c_r <- c()
  for (i in c_relig$punch)  {
    c_r[i] <- as.character(c_relig[c_relig$punch == i, "label"])
  }
  c_r[99] <- "NA"
  
  c_d <- c()
  for (i in c_denom$punch)  {
    c_d[i] <- as.character(c_denom[c_denom$punch == i, "label"])
  }
  c_d[99] <- "NA"
  
  c_o <- c()
  for (i in c_other$punch)  {
    c_o[i] <- as.character(c_other[c_other$punch == i, "label"])
  }
  c_o[999] <- "NA"
  
  # Liberal Protestants
  lp_d_num <- c(40:49)
  lp_o_num <- c(29, 30, 40, 54, 70, 72 , 81, 82, 95, 98, 119,
                142, 160, 188)
  lp_denom <- c_d[lp_d_num]
  lp_other <- c_o[lp_o_num]
  DF$lp_true <- (DF$denom %in% lp_denom) | (DF$other %in% lp_other)
  DF$rv[DF$lp_true] <- "Liberal Protestant"
  
  # Episcopalians 
  ep_d_num <- c(50)
  ep_denom <- c_d[ep_d_num]
  DF$ep_true <- DF$denom %in% ep_denom
  DF$rv[DF$ep_true] <- "Episcopalian"
  
  # Moderate Protestants
  mp_d_num <- c(10:13, 20:23, 28)
  mp_o_num <- c(1, 8, 15, 19, 25, 32, 42:44, 46, 49:51, 71, 73, 94,
                99, 146, 148, 150, 186)
  mp_denom <- c_d[mp_d_num]
  mp_other <- c_o[mp_o_num]
  DF$mp_true <- (DF$denom %in% mp_denom) | (DF$other %in% mp_other)
  DF$rv[DF$mp_true] <- "Moderate Protestant"
  
  # Lutherans
  lt_d_num <- c(30:38)
  lt_o_num <- c(105)
  lt_denom <- c_d[lt_d_num]
  lt_other <- c_o[lt_o_num]
  DF$lt_true <- (DF$denom %in% lt_denom) | (DF$other %in% lt_other)
  DF$rv[DF$lt_true] <- "Lutheran"
  
  # Baptists
  bp_d_num <- c(14:18)
  bp_o_num <- c(93, 133, 197)
  bp_denom <- c_d[bp_d_num]
  bp_other <- c_o[bp_o_num]
  DF$bp_true <- (DF$denom %in% bp_denom) | (DF$other %in% bp_other)
  DF$rv[DF$bp_true] <- "Baptist"
  
  # Sectarian Protestants
  # these initial variables pull out sectarians codes
  # relig == 11 (christian) or relig == 5 (other),
  # but also have valid denom codes. 
  DF$sp_pent <- (DF$relig == c_r[11]) & (DF$other == c_o[68])
  DF$sp_centchrist <- (DF$relig == c_r[5]) & (DF$other == c_o[31])
  DF$sp_fsg <- (DF$relig == c_r[5]) & (DF$other == c_o[53])
  DF$sp_jw <- (DF$relig == c_r[5]) & (DF$other == c_o[58])
  DF$sp_sda <- (DF$relig == c_r[5]) & (DF$other == c_o[77])
  DF$sp_ofund <- (DF$relig == c_r[5]) & (DF$other == c_o[97])
  
  sp_o_num <- c(2, 3, 5:7, 9, 10, 12:14, 16:18, 20:24, 26, 27, 31,
                33:39, 41, 45, 47, 48, 52, 53, 55:58, 63, 65:69,
                76:79, 83:92, 96, 97, 100:104, 106:113, 115:118,
                120:122, 124, 125, 127:132, 134, 135, 137:141, 144,
                145, 151:156, 158, 159, 166:182, 184, 185, 187,
                189:191, 193, 195, 196, 198, 201, 204)
  sp_other <- c_o[sp_o_num]
  
  DF$sp_true <- ((DF$other %in% sp_other) |
                   DF$sp_pent |
                   DF$sp_centchrist |
                   DF$sp_fsg |
                   DF$sp_jw |
                   DF$sp_sda |
                   DF$sp_ofund
                 )
  DF$rv[DF$sp_true] <- "Sectarian Protestant"
  
  # Christian, no group identified
  DF$cn_christ <- (DF$relig == c_r[11]) & !DF$sp_pent
  cn_r_num <- c(13)
  cn_d_num <- c(70, 98, 99)
  cn_o_num <- c(998, 999)
  cn_relig <- c_r[cn_r_num]
  cn_denom <- c_d[cn_d_num]
  cn_other <- c_o[cn_o_num]
  DF$cn_true <- ((DF$relig %in% cn_relig) |
                   (DF$denom %in% cn_denom) |
                   (DF$other %in% cn_other) |
                   DF$cn_christ
                 )
  DF$rv[DF$cn_true] <- "Christian, no group given"
  
  # Mormons
  mr_o_num <- c(59:62, 64, 157, 162)
  mr_other <- c_o[mr_o_num]
  DF$mr_true <- DF$other %in% mr_other
  DF$rv[DF$mr_true] <- "Mormon"
  
  # Catholics or Orthodox Christians
  co_r_num <- c(2, 10)
  co_o_num <- c(28, 123, 126, 143, 149, 183, 194)
  co_relig <- c_r[co_r_num]
  co_other <- c_o[co_o_num]
  DF$co_true <- (DF$relig %in% co_relig) | (DF$other %in% co_other)
  DF$rv[DF$co_true] <- "Catholic or Orthodox"
  
  # Jews
  jw_r_num <- c(3)
  jw_relig <- c_r[jw_r_num]
  DF$jw_true <- DF$relig %in% jw_relig
  DF$rv[DF$jw_true] <- "Jewish"
  
  # Other religions 
  DF$or_nonsp <- (DF$relig == c_r[5]) & !(DF$sp_pent |
                                            DF$sp_centchrist |
                                            DF$sp_fsg |
                                            DF$sp_jw |
                                            DF$sp_sda |
                                            DF$sp_ofund
                                          )
  or_r_num <- c(6:9, 12)
  or_o_num <- c(11, 74, 75, 80, 114, 136, 161, 163, 164, 192)
  or_relig <- c_r[or_r_num]
  or_other <- c_o[or_o_num]
  DF$or_true <- ((DF$relig %in% or_relig) |
                   (DF$other %in% or_other) |
                   DF$or_nonsp
                 )
  DF$rv[DF$or_true] <- "Other religion"
  
  # No religious identification
  nr_r_num <- c(4)
  nr_relig <- c_r[nr_r_num]
  DF$nr_true <- DF$relig %in% nr_relig
  DF$rv[DF$nr_true] <- "None"
  
  # Missing values  
  # No Answer
  DF$na_relig <- DF$relig == c_r[99]
  DF$na_denom <- DF$denom == c_d[99] 
  DF$na_rd <- DF$na_relig & DF$na_denom
  DF$rv[DF$na_rd] <- "No answer"
  
  # Don't know
  DF$dk_relig <- DF$relig == c_r[98]
  DF$rv[DF$dk_relig] <- "Dont know"
  
  # Treat it as factor
  DF$rv <- as.factor(DF$rv)
  
  # Provide table with proportions
  print(cbind(Freq = table(DF$rv, useNA = "ifany"),
              Relative = round(100 *
                                 prop.table(
                                   table(
                                     DF$rv,
                                     useNA = "ifany"
                                     )
                                   ),
                               2),
              Cumul = round(100 *
                              cumsum(
                                prop.table(
                                  table(
                                    DF$rv,
                                    useNA = "ifany"
                                    )
                                  )
                                ),
                            2)
              )
        )
  
  # Return the vector with recoded religion
  return(DF$rv)
}

Practical example of use

If we simply apply function to the GSS 2014 data,2 we get following:

GSS$religion <- rec_relig_12(GSS$relig, GSS$denom, GSS$other)
##                           Freq Relative  Cumul
## Baptist                    324    12.77  12.77
## Catholic or Orthodox       615    24.23  37.00
## Christian, no group given  307    12.10  49.09
## Dont know                    3     0.12  49.21
## Episcopalian                39     1.54  50.75
## Jewish                      40     1.58  52.32
## Liberal Protestant          77     3.03  55.36
## Lutheran                    92     3.62  58.98
## Moderate Protestant        206     8.12  67.10
## Mormon                      32     1.26  68.36
## No answer                   15     0.59  68.95
## None                       522    20.57  89.52
## Other religion              88     3.47  92.99
## Sectarian Protestant       178     7.01 100.00

And the tail, from the beginning looks like this.

tail(GSS[, c("relig", "denom", "other", "religion")])
##           relig            denom                          other
## 2533 PROTESTANT BAPTIST-DK WHICH                            IAP
## 2534 PROTESTANT UNITED METHODIST                            IAP
## 2535       NONE              IAP                            IAP
## 2536       NONE              IAP                            IAP
## 2537   CATHOLIC              IAP                            IAP
## 2538 PROTESTANT            OTHER Congregationalist, 1st Congreg
##                  religion
## 2533              Baptist
## 2534  Moderate Protestant
## 2535                 None
## 2536                 None
## 2537 Catholic or Orthodox
## 2538   Liberal Protestant

Final remarks

Function still needs some fixing/generalization for handling different coding of variables. I will periodically update it, as I occasionally improve it.

Please feel free to contact me with any questions or comments.

If you find function handy and use it, cite the source paper; in addition acknowledging my contribution is appreciated.


  1. Paper contains SPSS syntax based on which I wrote function - also some of the comments in the code are pasted from there.

  2. Data is not included to github repository since it is larger than 10MB. It can be downloaded freely from NORC website.