Does the same thing as SPSS mean.#
. Calculates mean of a case if more than n values are not NA, otherwise returns NA. Handy for making indexes of several variables when you can specify to calculate it if e.g. more than 75% or 90% of values are present in each case.
average_excluding <- function(G, n) {
######## AVERAGE EXCLUDING ###########
# returns the mean of G variables
# for cases with more than n missing
# G is dataframe of desired vars
apply(G, 1,
function(x) {
if (sum(is.na(x)) > n) mean(x)
else mean(x, na.rm = TRUE)
}
)
}
Doing a simple sample, six variables, each case has different number of missing values.
DF <- data.frame(var_1 = c(10, 20, 30, 40, 50),
var_2 = c(11, 21, 31, 41, NA),
var_3 = c(12, 22, 32, NA, NA),
var_4 = c(13, 23, NA, NA, NA),
var_5 = c(14, NA, NA, NA, NA),
var_6 = c(NA, NA, NA, NA, NA)
)
DF$av_miss_2 <- average_excluding(DF[, 1:6], 2)
DF$av_miss_5 <- average_excluding(DF[, 1:6], 5)
DF
## var_1 var_2 var_3 var_4 var_5 var_6 av_miss_2 av_miss_5
## 1 10 11 12 13 14 NA 12.0 12.0
## 2 20 21 22 23 NA NA 21.5 21.5
## 3 30 31 32 NA NA NA NA 31.0
## 4 40 41 NA NA NA NA NA 40.5
## 5 50 NA NA NA NA NA NA 50.0
And this is how it comes in the end. We have means for cases which have no more than two missing values, and no more than five missing values.