cSplit_e from splitstackshape package not accounting for NA's?
metricsSO opened this issue · comments
metricsSO commented
Following my SO post here, I would appreciate if you could fix the bug.
data1<-structure(list(reason = c("1", "1", NA, "1", "1", "4 5", "1",
"1", "1", "1", "1", "1 2 3 4", "1 2 5", NA, NA)), .Names = "reason", class = "data.frame", row.names = c(NA,
-15L))
#loading packages
library(data.table)
library(splitstackshape)
cSplit_e(setDT(data1),1," ",mode = "value") # with NA's doesn't work
Error in seq.default(min(vec), max(vec)) : 'from' must be a finite number
data2<-na.omit(setDT(data1),cols="reason") # removing NA's
cSplit_e(data2,1," ",mode = "value") # without NA's works
reason reason_1 reason_2 reason_3 reason_4 reason_5
1: 1 1 NA NA NA NA
2: 1 1 NA NA NA NA
3: 1 1 NA NA NA NA
4: 1 1 NA NA NA NA
5: 4 5 NA NA NA 4 5
6: 1 1 NA NA NA NA
7: 1 1 NA NA NA NA
8: 1 1 NA NA NA NA
9: 1 1 NA NA NA NA
10: 1 1 NA NA NA NA
11: 1 2 3 4 1 2 3 4 NA
12: 1 2 5 1 2 NA NA 5
Ananda Mahto commented
@metricsSO , Thanks for filing the issue. A modification of numMat()
as in the following seems to work, but I'll have to do some testing over the weekend before deciding to commit to anything.
listOfValues <- lapply(strsplit(data1$reason, " "), as.integer)
len <- length(listOfValues)
vec <- unlist(listOfValues, use.names = FALSE)
## min and max should use na.rm = TRUE
slvl <- seq(min(vec, na.rm = TRUE), max(vec, na.rm = TRUE))
out <- matrix(NA_integer_, nrow = len, ncol = length(slvl), dimnames = list(NULL, slvl))
i.idx <- rep(seq_len(len), vapply(listOfValues, length, integer(1L)))
j.idx <- match(vec, slvl)
## na.omit required on both the cbind and vector
out[na.omit(cbind(i.idx, j.idx))] <- na.omit(vec)
out
Ananda Mahto commented
Should be fixed with the functions here. Will try to rework the versions in the package soon.
Ananda Mahto commented
Closed with 6a0e8b4.