mrdwab / splitstackshape

R functions to split concatenated data, conveniently stack columns of data.frames, and conveniently reshape data.frames.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

cSplit_e from splitstackshape package not accounting for NA's?

metricsSO opened this issue · comments

Following my SO post here, I would appreciate if you could fix the bug.

data1<-structure(list(reason = c("1", "1", NA, "1", "1", "4 5", "1", 
"1", "1", "1", "1", "1 2 3 4", "1 2 5", NA, NA)), .Names = "reason", class = "data.frame", row.names = c(NA, 
-15L))

 #loading packages
 library(data.table)
 library(splitstackshape)

cSplit_e(setDT(data1),1," ",mode = "value") # with NA's doesn't work

Error in seq.default(min(vec), max(vec)) : 'from' must be a finite number

data2<-na.omit(setDT(data1),cols="reason") # removing NA's 

cSplit_e(data2,1," ",mode = "value") # without NA's works
     reason reason_1 reason_2 reason_3 reason_4 reason_5
 1:       1        1       NA       NA       NA       NA
 2:       1        1       NA       NA       NA       NA
 3:       1        1       NA       NA       NA       NA
 4:       1        1       NA       NA       NA       NA
 5:     4 5       NA       NA       NA        4        5
 6:       1        1       NA       NA       NA       NA
 7:       1        1       NA       NA       NA       NA
 8:       1        1       NA       NA       NA       NA
 9:       1        1       NA       NA       NA       NA
10:       1        1       NA       NA       NA       NA
11: 1 2 3 4        1        2        3        4       NA
12:   1 2 5        1        2       NA       NA        5

@metricsSO , Thanks for filing the issue. A modification of numMat() as in the following seems to work, but I'll have to do some testing over the weekend before deciding to commit to anything.

listOfValues <- lapply(strsplit(data1$reason, " "), as.integer)
len <- length(listOfValues)
vec <- unlist(listOfValues, use.names = FALSE)
## min and max should use na.rm = TRUE
slvl <- seq(min(vec, na.rm = TRUE), max(vec, na.rm = TRUE))
out <- matrix(NA_integer_, nrow = len, ncol = length(slvl), dimnames = list(NULL, slvl))
i.idx <- rep(seq_len(len), vapply(listOfValues, length, integer(1L)))
j.idx <- match(vec, slvl)
## na.omit required on both the cbind and vector
out[na.omit(cbind(i.idx, j.idx))] <- na.omit(vec)
out

Should be fixed with the functions here. Will try to rework the versions in the package soon.

Closed with 6a0e8b4.