R function that recodes GSS religion from three variables (relig
, denom
, other
) to 12 categories, according to Darren E. Sherkat and Derek Lehman, “After The Resurrection: The Field of the Sociology of Religion in the United States”.1
UPDATE
The package is not on CRAN yet. It is located on github.com/mdjeric/resurrectionr, and can be installed using devtools::install_github("mdjeric/resurrectionr")
. There is also a dedicated website for it mdjeric.github.io/resurrectionr. It is a considerable step forward from this function and includes several options, accepts different types of variables, etc.
Main goal is to address four problems:
At this point, only 1 works well, while 2 and 3 are partial. Beside huge number of categories (cca 200 for other), main problem is that punch codes from GSS codebook do not correspond to factor values once data is imported (some codes and blocks of numbers are just skipped).
Function is usable but with certain limitations. At this moment it successfully works only for respondent’s religion and religion at the age of 16 (although the process can be used on other derived variables), and when data is imported from SPSS file to data frame, with trimmed factor names and values, without use of missing values.
If this does not suit your need, ad hoc workaround is to import GSS dataset in your fashion of choice (e.g. with use of NA
) and identical dataset as required by this function. You should imediatlly use the function on the second dataframe, and assing values of new variable to the first. Ideally, you would also assing the variables used for recoding, and check them against main dataframe that everything is OK.
There is probably an easier way to do this, but I started using imported GSS data from .sav files with categorical variables as factors using foreign package. However, R’s nominal values of factors do not correspond to SPSS codes for factors (i.e. corresponding punches in GSS codebook) which causes problem in trying to directly replicate SPSS syntax.
Here we can see the sample from 2014 GSS file:
library(foreign)
GSS <- read.spss("_GSS2014.sav",
to.data.frame = TRUE,
trim.factor.names = TRUE,
trim_values = TRUE,
use.missings = FALSE
)
summary(GSS[, c("relig", "denom", "other")])
## relig denom other
## PROTESTANT:1125 IAP :1264 IAP :2269
## CATHOLIC : 606 NO DENOMINATION : 298 Pentecostal : 51
## NONE : 522 OTHER : 254 Mormon : 32
## CHRISTIAN : 134 BAPTIST-DK WHICH: 151 Jehovah's Witnesses: 22
## JEWISH : 40 SOUTHERN BAPTIST: 146 Church of Christ : 17
## OTHER : 27 UNITED METHODIST: 110 NA : 16
## (Other) : 84 (Other) : 315 (Other) : 131
str(GSS[, c("relig", "denom", "other")])
## 'data.frame': 2538 obs. of 3 variables:
## $ relig: Factor w/ 16 levels "IAP","PROTESTANT",..: 3 3 2 3 3 2 3 3 3 3 ...
## $ denom: Factor w/ 30 levels "IAP","AM BAPTIST ASSO",..: 1 1 16 1 1 26 1 1 1 1 ...
## $ other: Factor w/ 202 levels "IAP","Hungarian Reformed",..: 1 1 1 1 1 1 1 1 1 1 ...
tail(GSS[, c("relig", "denom", "other")])
## relig denom other
## 2533 PROTESTANT BAPTIST-DK WHICH IAP
## 2534 PROTESTANT UNITED METHODIST IAP
## 2535 NONE IAP IAP
## 2536 NONE IAP IAP
## 2537 CATHOLIC IAP IAP
## 2538 PROTESTANT OTHER Congregationalist, 1st Congreg
As a starting point for a more universal recoding function, this one uses three tables (in religion_punches
folder, heads presented here) from which three vectors are formed (relig
, denom
, and other
) where index corresponds to punch number, and label is value.
## $relig.csv
## punch label
## 1 0 IAP
## 2 1 PROTESTANT
## 3 2 CATHOLIC
## 4 3 JEWISH
## 5 4 NONE
## 6 5 OTHER
##
## $denom.csv
## punch label
## 1 0 IAP
## 2 10 AM BAPTIST ASSO
## 3 11 AM BAPT CH IN USA
## 4 12 NAT BAPT CONV OF AM
## 5 13 NAT BAPT CONV USA
## 6 14 SOUTHERN BAPTIST
##
## $punch.csv
## punch label
## 1 0 IAP
## 2 1 Hungarian Reformed
## 3 2 Evangelical Congregational
## 4 3 Ind Bible, Bible, Bible Fellowship
## 5 5 Church of Prophecy
## 6 6 New Testament Christian
For each new religious identification, separate vectors (for relig
, denom
, and/or other
) are created that contain punch codes and labels, per paper and SPSS syntax. In addition, logical vectors are created for Sectarian Protestants with valid denomination codes for sorting between ‘Sectarian Protestants’, ‘Christian - no group given’, and ‘other religions’.
This approach is somewhat superior to usual practices of recoding, since it can be confirmed through basic set operations that there is no overlap between groups or that some categories are left out.
Next, twelve logical vectors, for respondent’s belonging to each group, are created by checking against appropriate vectors containing labels. Names are assigned, function prints frequency table, and returns vector that can be assigned to a new variable.
This allows that basic structure can be generalized for recoding of all three-variable sets of religious identification that can be imported as either factors with names or numerical values (or combination, as is the case with some groups), easy recoding into different number of groups, and printing of detailed classification that includes names of all religions and denominations.
rec_relig_12 <- function(religion, denomination, other) {
######## RECODING RELIGION AND RELIGION AT 16 ###########
# Sherkat and Lehman (2017)
# To work properly, folder 'religion_punches' with .csvs
# of label names has to be in wokring directory.
#
# Function: # rec_relig_12(religion, denomination, other)
# relig or relig16 variable; denom or denom 16; other or oth16
# function prints frequencies and returns factor vector with
# religion recoded.
#
# It works with GSS dataset imported through 'read.spss',
# from foreign package, in following way:
# to.data.frame = TRUE, trim.factor.names = TRUE,
# trim_values = TRUE, use.missings = FALSE
# Import three varaibles into new dataset used for recoding
DF <- data.frame(relig = religion,
denom = denomination,
other = other
)
# Read values for all variables
c_relig <- read.csv("religion_punches/relig.csv")
c_denom <- read.csv("religion_punches/denom.csv")
c_other <- read.csv("religion_punches/other.csv")
# Create vectors with position corespondign to the punch
# of label in DF codebook for 3 variables
c_r <- c()
for (i in c_relig$punch) {
c_r[i] <- as.character(c_relig[c_relig$punch == i, "label"])
}
c_r[99] <- "NA"
c_d <- c()
for (i in c_denom$punch) {
c_d[i] <- as.character(c_denom[c_denom$punch == i, "label"])
}
c_d[99] <- "NA"
c_o <- c()
for (i in c_other$punch) {
c_o[i] <- as.character(c_other[c_other$punch == i, "label"])
}
c_o[999] <- "NA"
# Liberal Protestants
lp_d_num <- c(40:49)
lp_o_num <- c(29, 30, 40, 54, 70, 72 , 81, 82, 95, 98, 119,
142, 160, 188)
lp_denom <- c_d[lp_d_num]
lp_other <- c_o[lp_o_num]
DF$lp_true <- (DF$denom %in% lp_denom) | (DF$other %in% lp_other)
DF$rv[DF$lp_true] <- "Liberal Protestant"
# Episcopalians
ep_d_num <- c(50)
ep_denom <- c_d[ep_d_num]
DF$ep_true <- DF$denom %in% ep_denom
DF$rv[DF$ep_true] <- "Episcopalian"
# Moderate Protestants
mp_d_num <- c(10:13, 20:23, 28)
mp_o_num <- c(1, 8, 15, 19, 25, 32, 42:44, 46, 49:51, 71, 73, 94,
99, 146, 148, 150, 186)
mp_denom <- c_d[mp_d_num]
mp_other <- c_o[mp_o_num]
DF$mp_true <- (DF$denom %in% mp_denom) | (DF$other %in% mp_other)
DF$rv[DF$mp_true] <- "Moderate Protestant"
# Lutherans
lt_d_num <- c(30:38)
lt_o_num <- c(105)
lt_denom <- c_d[lt_d_num]
lt_other <- c_o[lt_o_num]
DF$lt_true <- (DF$denom %in% lt_denom) | (DF$other %in% lt_other)
DF$rv[DF$lt_true] <- "Lutheran"
# Baptists
bp_d_num <- c(14:18)
bp_o_num <- c(93, 133, 197)
bp_denom <- c_d[bp_d_num]
bp_other <- c_o[bp_o_num]
DF$bp_true <- (DF$denom %in% bp_denom) | (DF$other %in% bp_other)
DF$rv[DF$bp_true] <- "Baptist"
# Sectarian Protestants
# these initial variables pull out sectarians codes
# relig == 11 (christian) or relig == 5 (other),
# but also have valid denom codes.
DF$sp_pent <- (DF$relig == c_r[11]) & (DF$other == c_o[68])
DF$sp_centchrist <- (DF$relig == c_r[5]) & (DF$other == c_o[31])
DF$sp_fsg <- (DF$relig == c_r[5]) & (DF$other == c_o[53])
DF$sp_jw <- (DF$relig == c_r[5]) & (DF$other == c_o[58])
DF$sp_sda <- (DF$relig == c_r[5]) & (DF$other == c_o[77])
DF$sp_ofund <- (DF$relig == c_r[5]) & (DF$other == c_o[97])
sp_o_num <- c(2, 3, 5:7, 9, 10, 12:14, 16:18, 20:24, 26, 27, 31,
33:39, 41, 45, 47, 48, 52, 53, 55:58, 63, 65:69,
76:79, 83:92, 96, 97, 100:104, 106:113, 115:118,
120:122, 124, 125, 127:132, 134, 135, 137:141, 144,
145, 151:156, 158, 159, 166:182, 184, 185, 187,
189:191, 193, 195, 196, 198, 201, 204)
sp_other <- c_o[sp_o_num]
DF$sp_true <- ((DF$other %in% sp_other) |
DF$sp_pent |
DF$sp_centchrist |
DF$sp_fsg |
DF$sp_jw |
DF$sp_sda |
DF$sp_ofund
)
DF$rv[DF$sp_true] <- "Sectarian Protestant"
# Christian, no group identified
DF$cn_christ <- (DF$relig == c_r[11]) & !DF$sp_pent
cn_r_num <- c(13)
cn_d_num <- c(70, 98, 99)
cn_o_num <- c(998, 999)
cn_relig <- c_r[cn_r_num]
cn_denom <- c_d[cn_d_num]
cn_other <- c_o[cn_o_num]
DF$cn_true <- ((DF$relig %in% cn_relig) |
(DF$denom %in% cn_denom) |
(DF$other %in% cn_other) |
DF$cn_christ
)
DF$rv[DF$cn_true] <- "Christian, no group given"
# Mormons
mr_o_num <- c(59:62, 64, 157, 162)
mr_other <- c_o[mr_o_num]
DF$mr_true <- DF$other %in% mr_other
DF$rv[DF$mr_true] <- "Mormon"
# Catholics or Orthodox Christians
co_r_num <- c(2, 10)
co_o_num <- c(28, 123, 126, 143, 149, 183, 194)
co_relig <- c_r[co_r_num]
co_other <- c_o[co_o_num]
DF$co_true <- (DF$relig %in% co_relig) | (DF$other %in% co_other)
DF$rv[DF$co_true] <- "Catholic or Orthodox"
# Jews
jw_r_num <- c(3)
jw_relig <- c_r[jw_r_num]
DF$jw_true <- DF$relig %in% jw_relig
DF$rv[DF$jw_true] <- "Jewish"
# Other religions
DF$or_nonsp <- (DF$relig == c_r[5]) & !(DF$sp_pent |
DF$sp_centchrist |
DF$sp_fsg |
DF$sp_jw |
DF$sp_sda |
DF$sp_ofund
)
or_r_num <- c(6:9, 12)
or_o_num <- c(11, 74, 75, 80, 114, 136, 161, 163, 164, 192)
or_relig <- c_r[or_r_num]
or_other <- c_o[or_o_num]
DF$or_true <- ((DF$relig %in% or_relig) |
(DF$other %in% or_other) |
DF$or_nonsp
)
DF$rv[DF$or_true] <- "Other religion"
# No religious identification
nr_r_num <- c(4)
nr_relig <- c_r[nr_r_num]
DF$nr_true <- DF$relig %in% nr_relig
DF$rv[DF$nr_true] <- "None"
# Missing values
# No Answer
DF$na_relig <- DF$relig == c_r[99]
DF$na_denom <- DF$denom == c_d[99]
DF$na_rd <- DF$na_relig & DF$na_denom
DF$rv[DF$na_rd] <- "No answer"
# Don't know
DF$dk_relig <- DF$relig == c_r[98]
DF$rv[DF$dk_relig] <- "Dont know"
# Treat it as factor
DF$rv <- as.factor(DF$rv)
# Provide table with proportions
print(cbind(Freq = table(DF$rv, useNA = "ifany"),
Relative = round(100 *
prop.table(
table(
DF$rv,
useNA = "ifany"
)
),
2),
Cumul = round(100 *
cumsum(
prop.table(
table(
DF$rv,
useNA = "ifany"
)
)
),
2)
)
)
# Return the vector with recoded religion
return(DF$rv)
}
If we simply apply function to the GSS 2014 data,2 we get following:
GSS$religion <- rec_relig_12(GSS$relig, GSS$denom, GSS$other)
## Freq Relative Cumul
## Baptist 324 12.77 12.77
## Catholic or Orthodox 615 24.23 37.00
## Christian, no group given 307 12.10 49.09
## Dont know 3 0.12 49.21
## Episcopalian 39 1.54 50.75
## Jewish 40 1.58 52.32
## Liberal Protestant 77 3.03 55.36
## Lutheran 92 3.62 58.98
## Moderate Protestant 206 8.12 67.10
## Mormon 32 1.26 68.36
## No answer 15 0.59 68.95
## None 522 20.57 89.52
## Other religion 88 3.47 92.99
## Sectarian Protestant 178 7.01 100.00
And the tail, from the beginning looks like this.
tail(GSS[, c("relig", "denom", "other", "religion")])
## relig denom other
## 2533 PROTESTANT BAPTIST-DK WHICH IAP
## 2534 PROTESTANT UNITED METHODIST IAP
## 2535 NONE IAP IAP
## 2536 NONE IAP IAP
## 2537 CATHOLIC IAP IAP
## 2538 PROTESTANT OTHER Congregationalist, 1st Congreg
## religion
## 2533 Baptist
## 2534 Moderate Protestant
## 2535 None
## 2536 None
## 2537 Catholic or Orthodox
## 2538 Liberal Protestant
Function still needs some fixing/generalization for handling different coding of variables. I will periodically update it, as I occasionally improve it.
Please feel free to contact me with any questions or comments.
If you find function handy and use it, cite the source paper; in addition acknowledging my contribution is appreciated.