tidyverse / dbplyr

Database (DBI) backend for dplyr

Home Page:https://dbplyr.tidyverse.org

Repository from Github https://github.comtidyverse/dbplyrRepository from Github https://github.comtidyverse/dbplyr

BigQuery pipe syntax

MichaelChirico opened this issue · comments

BigQuery now supports a highly tidy-ish piping syntax:

https://cloud.google.com/bigquery/docs/reference/standard-sql/pipe-syntax

FROM Produce
|> WHERE
    item != 'bananas'
    AND category IN ('fruit', 'nut')
|> AGGREGATE COUNT(*) AS num_items, SUM(sales) AS total_sales
   GROUP BY item
|> ORDER BY item DESC;

Obviously for simple cases this makes the {dbplyr}-SQL translation pretty trivial:

conn |>
  tbl("Produce") |>
  filter(item != "bananas", category %in% c("fruit", "nut")) |>
  summarize(num_items = n(), total_sales = sum(sales, na.rm=TRUE), .by="item") |>
  arrange(desc(item))

What is the right way to go about supporting this translation?

Would this be something only worthwhile implementing in {bigrquery}? Does it make sense to approach it as piecemeal changing how individual verbs are translated, or would it make more sense to implement it in {dbplyr} itself as a different "mode" of translation -- like we have the option to render using sql_options(cte=TRUE/FALSE) to use CTEs or subqueries, we could have a 3rd option to render with pipes?

I'm happy to help with the implementation; starting here to consult on what you think the best approach is.

If this is something that {bigrquery} should be wholly responsible for, that's also fine.