atorus-research / Tplyr

Home Page:https://atorus-research.github.io/Tplyr/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Complex Filtering

mattkumar opened this issue · comments

Hi guys,

I'm attempting to use Tplyr to compute a group_count layer that I'm not sure how to specify. To give some background, I've simulated a partial adae table below that has USUBJID, ARM and AETOXGRN. AETOXGRN is a toxicity grading used frequently within Oncology and ranges from 1 to 5.

What I'm interested in counting is each subjects worst (i.e. highest) toxicity grades. I'm interested in distinct counts, so for example, if subject X had two AEs, one graded with AETOXGRN = 1, and another with AETOXGRN = 4, I'd like this subject to be counted in the "4" category only.

I can achieve this in dplyr, and also achieve this in Tplyr with some up-front filtering. However, I'm wondering if I can specify something like this directly in Tplyr.

Here is some code for my exploration.

library(dplyr)
library(Tplyr)

adae <- tibble::tribble(
  ~USUBJID,        ~ARM, ~AETOXGRN,
  1L, "Treatment",        3L,
  1L, "Treatment",        1L,
  1L, "Treatment",        2L,
  1L, "Treatment",        3L,
  1L, "Treatment",        1L,
  2L,   "Placebo",        3L,
  2L,   "Placebo",        3L,
  2L,   "Placebo",        4L,
  2L,   "Placebo",        5L,
  2L,   "Placebo",        4L,
  2L,   "Placebo",        2L,
  3L, "Treatment",        1L,
  4L,   "Placebo",        1L,
  5L, "Treatment",        1L,
  5L, "Treatment",        1L,
  5L, "Treatment",        5L,
  5L, "Treatment",        3L,
  5L, "Treatment",        2L,
  5L, "Treatment",        4L,
  5L, "Treatment",        1L
)
# using dplyr
adae %>%
  group_by(USUBJID) %>%
    arrange(desc(AETOXGRN)) %>%
    slice(1) %>%
  ungroup %>%
  count(ARM, AETOXGRN)

# dplyr output
# A tibble: 5 x 3
# ARM          AETOXGRN     n
# <chr>        <int>      <int>
# Placebo           1           1
# Placebo           5           1
# Treatment        1          1
# Treatment        3          1
# Treatment        5          1
# Using Tplyr
t <- tplyr_table(adae, ARM) %>%
  add_layer(
    group_count(AETOXGRN, where = AETOXGRN == max(AETOXGRN)) %>%
      set_distinct_by(USUBJID)
  )

t %>% build()

# Tplyr output
# A tibble: 1 x 5
# row_label1 var1_Placebo var1_Treatment ord_layer_index ord_layer_1
# <chr>      <chr>             <chr>            <int>       <dbl>
#   5       1 (100.0%)       1 (100.0%)             1               5

I can see that Tplyr only outputs the result for the max(AETOXGRN) grade, 5, which looks correct. So it seems my filter is acting on a data set level rather than a per USUBJID level. Is there a good way to specify a where filter of this nature or have I maybe missed other options in Tplyr?

Curious to hear any thoughts!

Thanks!
Matt

@mattkumar thanks for submitting this!

Currently this wouldn't be possible because we don't really have a clean way to make groups pass down into where filter conditions are applied. We didn't really plan for that so it would take a good bit of thought for how to do it elegantly. Like I'm almost thinking that it would be safer to pre-derive a flag and use the flag, which is how ADaM datasets would typically set things up. Because grouping and ungrouping here is a bit tricky.

What do you think?