tidyverse / forcats

๐Ÿˆ๐Ÿˆ๐Ÿˆ๐Ÿˆ: tools for working with categorical variables (factors)

Home Page:https://forcats.tidyverse.org/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Make `fct_reorder` work out-of-the-box with factor `.x`

orgadish opened this issue ยท comments

I often use forcats to arrange one factor in one way, and then want to use that ordering to arrange the levels of another factor. However, fct_reorder doesn't work out-of-the-box with factors because the default .fun=median. This default is appropriate for numeric values, which is one common use case, but not for factors, which is a very common use case in my work.

One way of addressing this is to allow for different default .fun depending on .x's type. If desired, there could be two separate functions fct_reorder and fct_reorderf which expect numeric or character/factor data, respectively (and thus have different defaults).
(Below I've used first as the default .fun for character/factors but I don't think that's right -- perhaps ~first(sort(.x)) or equivalent?)

Another way is to convert all non-numeric .x into a numeric value (as.numeric(as.factor(.x))). But using median "under the hood" in that context seems like it could cause more confusion in some cases.

library(tidyverse)

starwars %>% 
  mutate(
    across(species, fct_relevel, "Human"),
    across(hair_color, fct_reorder, species)
  )
#> Error: Problem with `mutate()` input `..2`.
#> โ„น `..2 = across(hair_color, fct_reorder, species)`.
#> x need numeric data

fct_reorder_that_also_accepts_factors <- function(.f, .x, .fun = NULL, ...) {
  if(is.null(.fun)) {
    if(is.numeric(.x)) .fun <- median
    else .fun <- first
  }
  
  fct_reorder(.f, .x, .fun, ...)
}

starwars %>% 
  mutate(
    across(species, fct_relevel, "Human"),
    across(hair_color, fct_reorder_that_also_accepts_factors, species)
  ) %>% 
  pull(hair_color) %>% 
  levels
#>  [1] "auburn"        "auburn, grey"  "auburn, white" "black"        
#>  [5] "blond"         "brown"         "brown, grey"   "grey"         
#>  [9] "none"          "blonde"        "white"         "unknown"

Created on 2021-11-29 by the reprex package (v2.0.1)

What summary functions make sense to use on factors?

What summary functions make sense to use on factors?

The only ones that come to mind is n_distinct() the the mode (not mode()).

Looking back at my various places I've used my own fct_reorderf which is similar to the implementation above, I find two use cases. In both cases, it seems likely that other functions are better suited for the job and that using fct_reorder with factors can lead to more confusion (see case 2 below).

  1. To perform string transformations (e.g. to prepare for plot labels) and re-attach the factors. In these cases, I should probably just use fct_relabel to do those transformations in the first place.
library(tidyverse)

fct_starwars <- starwars %>% 
  drop_na(sex) %>% 
  mutate(across(sex, factor, levels=c("female", "male", "hermaphroditic", "none")))

fct_starwars %>% 
  mutate(sex_label = if_else(sex == "none", "asexual", str_c(sex, " sex")) %>% fct_reorder(sex)) %>% 
  pull(sex_label) %>% 
  levels()
#> Error in `mutate()`:
#> ! Problem while computing `sex_label = ... %>% fct_reorder(sex)`.
#> Caused by error in `median.default()`:
#> ! need numeric data


fct_starwars %>% 
  mutate(sex_label = fct_relabel(sex, ~if_else(.x == "none", "asexual", str_c(.x, " sex")))) %>% 
  pull(sex_label) %>% 
  levels()
#> [1] "female sex"         "male sex"           "hermaphroditic sex"
#> [4] "asexual"

Created on 2022-03-07 by the reprex package (v2.0.1)

  1. Similar to the above, but using tidyr::extract or tidyr::separate to create new columns and I want to re-attach the levels of the combined column. Is there a different way to do this? Perhaps with arrange and fct_inorder. Note that they give different results -- which goes against my initial reason for this feature request and actually would support keeping fct_reorder as is, not working with factors.
library(tidyverse)

# In my actual cases the combined column is generated elsewhere with levels attached.
fct_starwars_hair_eye <- starwars %>% 
  drop_na(hair_color, eye_color) %>%
  arrange(hair_color, eye_color) %>% 
  mutate(hair_eye = str_c(hair_color, "; ", eye_color, "-eyed") %>% fct_inorder())

fct_starwars_hair_eye %>% 
  separate(hair_eye, c("hair_color", "eye_color"), sep="; ", remove=F) %>% 
  mutate(across(c(hair_color, eye_color), fct_reorder, hair_eye)) %>% 
  pull(hair_color) %>% 
  levels()
#> Error in `mutate()`:
#> ! Problem while computing `..1 = across(c(hair_color, eye_color),
#>   fct_reorder, hair_eye)`.
#> Caused by error in `across()`:
#> ! Problem while computing column `hair_color`.
#> Caused by error in `median.default()`:
#> ! need numeric data

fct_starwars_hair_eye %>% 
  separate(hair_eye, c("hair_color", "eye_color"), sep="; ", remove=F) %>% 
  mutate(across(c(hair_color, eye_color), fct_reorder, as.numeric(hair_eye))) %>% 
  pull(hair_color) %>% 
  levels()
#>  [1] "auburn"        "auburn, grey"  "auburn, white" "black"        
#>  [5] "blond"         "blonde"        "brown"         "brown, grey"  
#>  [9] "grey"          "none"          "unknown"       "white"

fct_starwars_hair_eye %>% 
  separate(hair_eye, c("hair_color", "eye_color"), sep="; ", remove=F) %>% 
  arrange(hair_eye) %>% 
  mutate(across(c(hair_color, eye_color), fct_inorder)) %>% 
  pull(hair_color) %>% 
  levels()
#>  [1] "auburn"        "auburn, grey"  "auburn, white" "black"        
#>  [5] "blond"         "blonde"        "brown"         "brown, grey"  
#>  [9] "grey"          "none"          "unknown"       "white"

Created on 2022-03-07 by the reprex package (v2.0.1)