tidyverse / purrr

A functional programming toolkit for R

Home Page:https://purrr.tidyverse.org/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

map_dfr

ggrothendieck opened this issue · comments

map_dfr has been deprecated and the help file suggests using map with list_rbind or list_cbind but that means that we need two commands where we used to get away with one. map_dfr was probably the most frequently used command in purrr at least for me. The reason to deprecate it cited is inconsistency of the name but if that is the only problem we could have map_dfr under another name. Also map(...) %>% list_cbind and map(...) %>% list_cbind do not even work below whereas map_dfr does -- maybe I am doing this wrong but it seems that this is what the help file is suggesting.

# https://stackoverflow.com/questions/76094233/remove-all-non-number-characters-from-every-cell-in-a-df/76094508#76094508
library(purrr)

DF <- structure(list(`Column A` = "dfd:13", `Column B` = "hhh:34", 
    `Column C` = "dsd:15", `Column D` = "ffd:67", `Column E` = "hdf:89", 
    `Column F` = "hhj:43"), class = "data.frame", row.names = c(NA, 
-1L))

DF %>%
  map(trimws, whitespace = "\\D") %>%
  list_cbind
## Error in `list_cbind()`:
## ! Each element of `x` must be either a data frame or `NULL`.
## ℹ Elements 1, 2, 3, 4, 5, and 6 are not.
## Run `rlang::last_trace()` to see where the error occurred.

DF %>%
  map(trimws, whitespace = "\\D") %>%
  list_rbind
## Error in `list_rbind()`:
## ! Each element of `x` must be either a data frame or `NULL`.
## ℹ Elements 1, 2, 3, 4, 5, and 6 are not.
## Run `rlang::last_trace()` to see where the error occurred.

# desired result
DF %>%
  map_dfr(trimws, whitespace = "\\D")
## # A tibble: 1 × 6
##   `Column A` `Column B` `Column C` `Column D` `Column E` `Column F`
##   <chr>      <chr>      <chr>      <chr>      <chr>      <chr>     
## 1 13         34         15         67         89         43   

IMO this example actually mildly supports our decision to deprecate map_dfr() because in this case it actually combines by column! I 100% agree that map_dfr() is very convenient, but this very convenience can lead to hard to diagnose problems when you hit edge cases because it tries so hard to make a data frame.

I'd solve your problem like this:

library(purrr)

DF <- structure(list(`Column A` = "dfd:13", `Column B` = "hhh:34", 
    `Column C` = "dsd:15", `Column D` = "ffd:67", `Column E` = "hdf:89", 
    `Column F` = "hhj:43"), class = "data.frame", row.names = c(NA, 
-1L))

DF %>%
  map(trimws, whitespace = "\\D") %>%
  tibble::as_tibble()
#> # A tibble: 1 × 6
#>   `Column A` `Column B` `Column C` `Column D` `Column E` `Column F`
#>   <chr>      <chr>      <chr>      <chr>      <chr>      <chr>     
#> 1 13         34         15         67         89         43

Created on 2023-04-25 with reprex v2.0.2

It remains that map(...) %>% list_cbind and similarly for list_rbind do not work yet that is what the documentation suggests and also that this is less convenient than what we had before.

Right, because this is one of the edge cases that IMO should never have worked (i.e. it's a bug).

Two things:

One: It's important to note that map_dfc(), map_dfr(), and the like are not being deprecated, but superseded. The term/lifecycle stage of superseded has been tricky to communicate, but Hadley describes it in the purrr release video here. In case you don't feel like watching the 30 seconds or so, he says superseded:

“…means they are not going away. They're not deprecated. They're not on a path to being removed. But we don't recommend them anymore. So in some sense these functions are like the safest functions to use because we're going to just put them away and we're never going to touch them. But we just don't recommend using them.”

We're going to pull this and the other bits about lifecycle stages out of the purrr video and make them into their own shorter video to hopefully help better communicate this to users (along with our various other lifecycle docs, e.g. the superseded description here).

Two: Do we think this is a sufficiently common pattern that it would be worth adding to the docs? If not fit for the actual function reference page, I could pull this out along with some other examples and turn it into some other form of documentation (e.g. field guide, article, blog post, etc.)

Yeah, I think this would be useful to add to the examples.