mrdwab / splitstackshape

R functions to split concatenated data, conveniently stack columns of data.frames, and conveniently reshape data.frames.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Possible bug in dealing with factor columns?

arunsrinivasan opened this issue · comments

I was attempting to answer this SO question using:

require(splitstackshape)
merged.stack(d, id="PID", var.stubs=c("Cue"), sep="var.stubs")
#     PID .time_1 Cue
#  1:   1       1   1
#  2:   1       1   3
#  3:   1       1   5
#  4:   1       2   2
#  5:   1       2   5
#  6:   1       2   5
#  7:   2       1   1
#  8:   2       1   3
#  9:   2       1   5
#10:   2       2   2
#11:   2       2   5
#12:   2       2   5

Unless I'm mistaken as to what the code is supposed to do, this isn't the right output.

sessionInfo()
# R version 3.1.2 (2014-10-31)
# Platform: x86_64-apple-darwin13.4.0 (64-bit)

# locale:
# [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

# attached base packages:
# [1] stats     graphics  grDevices utils     datasets  methods   base     

# other attached packages:
# [1] splitstackshape_1.4.2 data.table_1.9.5     

# loaded via a namespace (and not attached):
# [1] chron_2.3-45 tools_3.1.2 

@arunsrinivasan, Oooh. That's ugly. Thanks for the heads up.

I don't think that it's a bug in merged.stack related to factors, but rather, that merged.stack won't work correctly if the "id.vars" don't actually work as IDs.

Here's an approach with getanID to show what I mean:

str(d)
# 'data.frame':  6 obs. of  3 variables:
#  $ PID : Factor w/ 2 levels "1","2": 1 1 1 2 2 2
#  $ Cue1: Factor w/ 3 levels "1","2","3": 1 2 3 1 2 3
#  $ Cue2: Factor w/ 1 level "5": 1 1 1 1 1 1
merged.stack(getanID(d, "PID"), var.stubs = "Cue", sep = "var.stubs")
#     PID .id .time_1 Cue
#  1:   1   1       1   1
#  2:   1   1       2   5
#  3:   1   2       1   2
#  4:   1   2       2   5
#  5:   1   3       1   3
#  6:   1   3       2   5
#  7:   2   1       1   1
#  8:   2   1       2   5
#  9:   2   2       1   2
# 10:   2   2       2   5
# 11:   2   3       1   3
# 12:   2   3       2   5
str(.Last.value)
# Classes ‘data.table’ and 'data.frame':  12 obs. of  4 variables:
#  $ PID    : Factor w/ 2 levels "1","2": 1 1 1 1 1 1 2 2 2 2 ...
#  $ .id    : int  1 1 2 2 3 3 1 1 2 2 ...
#  $ .time_1: chr  "1" "2" "1" "2" ...
#  $ Cue    : Factor w/ 4 levels "1","2","3","5": 1 4 2 4 3 4 1 4 2 4 ...
#  - attr(*, "sorted")= chr  "PID" ".id" ".time_1"
#  - attr(*, ".internal.selfref")=<externalptr> 

Let me still look at it a little bit more though....

(Sorry for the delayed response -- I never got any notification from GitHub when I logged in that there was anything for me to look at!)

(Oh, and I hope you don't mind me hijacking the question--the accepted answer was bugging me!)