atorus-research / Tplyr

Home Page:https://atorus-research.github.io/Tplyr/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Inf, -Inf in tplyr table if Min, Max only contain NA values

johanneswerner opened this issue · comments

One of my treatment groups only has missing values. Why does the summary table show Inf, -Inf (and is there a possibility to transform this to something more readable)?

As an example, I used the CO2 dataset and added a new row with NA for conc.

library(tidyverse)
library(Tplyr)

data(CO2)
df <- CO2
df <- df %>%
  add_row(
    Plant = "Mc4",
    Type = "Colorado",
    Treatment = "nonchilled",
    conc = NA,
    uptake = NA
  )

tplyr_table(df, Plant) %>% 
  add_layer(
    group_desc(conc, by = "Treatment", where = Type == "Colorado") %>% 
      set_format_strings(
        "n"        = f_str("xx", n),
        "Mean (SD)"= f_str("xx.x (xx.xx)", mean, sd),
        "Median"   = f_str("xx.x", median),
        "Q1, Q3"   = f_str("xx, xx", q1, q3),
        "Min, Max" = f_str("xx, xx", min, max),
        "Missing"  = f_str("xx", missing)
      )
  ) %>% 
  build()

And here is the Tplyr output

# A tibble: 6 × 18
  row_label1 row_label2 var1_Mc1 var1_Mc2 var1_Mc3 var1_Mc4    var1_Mn1 var1_Mn2 var1_Mn3 var1_Qc1 var1_Qc2 var1_Qc3
  <chr>      <chr>      <chr>    <chr>    <chr>    <chr>       <chr>    <chr>    <chr>    <chr>    <chr>    <chr>   
1 Treatment  n          ""       ""       ""       " 1"        ""       ""       ""       ""       ""       ""      
2 Treatment  Mean (SD)  ""       ""       ""       ""          ""       ""       ""       ""       ""       ""      
3 Treatment  Median     ""       ""       ""       ""          ""       ""       ""       ""       ""       ""      
4 Treatment  Q1, Q3     ""       ""       ""       ""          ""       ""       ""       ""       ""       ""      
5 Treatment  Min, Max   ""       ""       ""       "Inf, -Inf" ""       ""       ""       ""       ""       ""      
6 Treatment  Missing    ""       ""       ""       " 1"        ""       ""       ""       ""       ""       ""      
# ℹ 6 more variables: var1_Qn1 <chr>, var1_Qn2 <chr>, var1_Qn3 <chr>, ord_layer_index <int>, ord_layer_1 <int>,
#   ord_layer_2 <int>

This is a duplicate of #21

This is a side effect of using na.rm=TRUE on the backend:

> min(c(NA), na.rm=TRUE)
[1] Inf
Warning message:
In min(c(NA), na.rm = TRUE) :
  no non-missing arguments to min; returning Inf

Would you prefer that this creates an empty string? Or how would you like it represented? Currently your best bet is to post process the strings. But there's two places we could address it in Tplyr's process:

  • Add f_str() handling of Inf similar to the NA handling in the empty parameter, or just convert Inf to NA to handle identically.
  • Update apply_conditional_formats() to recognize Inf as a numeric value and handle the replacements similarly.