atorus-research / xportr

Tools to build CDISC compliant data sets and check for CDISC compliance.

Home Page:https://atorus-research.github.io/xportr/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Bug: xportr_df_label() doesn't seem to work unless domain argument set

rossfarrugia opened this issue · comments

What happened?

The following shows how we have metacore configured but then when trying to set the dataset label we get "character(0)"

> metacore$ds_spec
# A tibble: 1 × 3
  dataset structure label                              
  <chr>   <chr>     <chr>                              
1 AAG     MISC      Adverse Event Grouping Defs Dataset
> 
> aag_test <- aag_prefinal %>%
+   xportr_df_label(metacore) 
>  
> attr(aag_test, "label")
[1] "character(0)"

However, now with setting domain this works:

> metacore$ds_spec
# A tibble: 1 × 3
  dataset structure label                              
  <chr>   <chr>     <chr>                              
1 AAG     MISC      Adverse Event Grouping Defs Dataset
> 
> aag_test <- aag_prefinal %>%
+   xportr_df_label(metacore, domain="AAG") 
>  
> attr(aag_test, "label")
[1] "Adverse Event Grouping Defs Dataset"

In case it helps, I asked for Stefan to take a look and his explanation was: xportr_df_label() tries to do some magic by determining the domain name from the name of the first argument. This may fail in a pipe.

Session Information

xportr_0.3.0

Reproducible Example

See above

@elimillera @bms63 that's actually a bad example i gave above as i recognise this argument says "If none is passed, then name of the dataset passed as .df will be used."

So here's another example instead for the same metacore object:

> aag <- aag %>%
+   xportr_label(metacore) %>%
+   xportr_df_label(metacore)
> 
> attr(aag, "label")
[1] "character(0)"
> attr(aag$SRCVAR, "label")
[1] "Variable on which Grouping is Based"

The odd thing as you can see is that xportr_label() does still work, so now it has us questioning whether we always should in fact be using the domain argument and should use xportr_label(metacore, domain = "AAG") as well.

Personally I think xportr_write() should handle this label part and we should deprecate xportr_df_label(). @elimillera what do you think about moving this inside of xportr_write()?

If you move setting the dataset label to xportr_write(), it should return the dataset with the dataset label set. Then the output dataset can be used to write the dataset (including the dataset label) in other formats. (At Roche we write xpt and parquet files.)

Agree with Stefan! Having the current dataframe written out with the label is useful in case you want to use this in other formats besides the XPT file. For me xportr_df_label() is clearer to a user as it has a clear name and it follows similar logic to xportr_label().

I agree with Ross that keeping xportr_df_label() is clearer.

Hey @rossfarrugia, I got around to debugging this. The reason the below doesn't work is due to casing.

> metacore$ds_spec
# A tibble: 1 × 3
  dataset structure label                              
  <chr>   <chr>     <chr>                              
1 AAG     MISC      Adverse Event Grouping Defs Dataset
> aag <- aag %>%
+   xportr_label(metacore) %>%
+   xportr_df_label(metacore)
> 
> attr(aag, "label")
[1] "character(0)"

The logic will try to match the name of the .df(aag) to the dataset(AAG). I could see us doing a case-insensitive match if that would improve the workflow.

Hi @elimillera , I would remove the logic which determines the domain from the name of the input dataset. I think most users do not expect that renaming the input dataset changes the result. Thus it is error-prone. Furthermore, it does not work in some cases:

> adsl <- data.frame(
+     USUBJID = c(1001, 1002, 1003),
+     SITEID = c(001, 002, 003),
+     AGE = c(63, 35, 27),
+     SEX = c("M", "F", "M")
+ )
> 
> metadata <- data.frame(
+     dataset = c("adsl", "adae"),
+     label = c("Subject-Level Analysis", "Adverse Events Analysis")
+ )
> 
> attr(adsl |> mutate(STUDYID = "XYZ1234") |> xportr_df_label(metadata), "label")
Error in `filter()`:
! Problem while computing `..1 = dataset == domain`.
✖ Input `..1` must be of size 2 or 1, not size 0.
Run `rlang::last_error()` to see where the error occurred.
> adsl <- data.frame(
+     USUBJID = c(1001, 1002, 1003),
+     SITEID = c(001, 002, 003),
+     AGE = c(63, 35, 27),
+     SEX = c("M", "F", "M")
+ )
> 
> metadata <- data.frame(
+     dataset = c("adsl", "adae"),
+     label = c("Subject-Level Analysis", "Adverse Events Analysis")
+ )
> 
> attr(xportr_df_label(mutate(adsl, STUDYID = "XYZ1234"), metadata), "label")
Error in `filter()`:
! Problem while computing `..1 = dataset == domain`.
✖ Input `..1` must be of size 2 or 1, not size 0.
Run `rlang::last_error()` to see where the error occurred.

If you want to keep the logic, I would suggest to issue an error if the domain can not be determined or is not included in the metadata.

what do you think @elimillera? i agree with Stefan's thoughts but at worst the case-insensitive match and clarifying in documentation would still be an improvement.

@rossfarrugia @bundfussr, I agree with the assessment. I don't like the functionality either, both for complexity and making the package a bit more confusing. My concern at this point is that its a breaking change so we'll code in some code to deprecate those features in a future release. I'm thinking we can get a new xportr release that addresses outstanding issues in a couple of months.

thanks @elimillera - all sounds good. as long as the plan isn't to completely deprecate the function then we're happy, as we can keep using it on our side in the templates we're rolling out to all our users. glad to hear it'll be enhanced over time.