prestodb / RPresto

DBI-based adapter for Presto for the statistical programming language R.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Support subscripting for maps

copernican opened this issue · comments

Per tidyverse/dbplyr@7e77a26 , dbplyr 1.4.0 will support remote evaluation of [[. This opens up the possibility of using [[ to index into map columns, as described in #41. It seems this could be accomplished by adding something like

`[[` = function(x, i) {
  i <- enexpr(i)
  if (!is.character(i)) {
    stop("Can only index with strings", call. = FALSE)
  }
  build_sql(x, "[", escape(i, con = con), "]")
}

to sql_translate_env.PrestoConnection(). This would allow syntax like

conn <- dbConnect(RPresto::Presto())
x <- tbl(conn, "x")
x %>% mutate(y = map_col[["key"]])

to create a new column y from the values of map_col corresponding to key. My preliminary testing indicates that this will work. I am happy to create a PR, if helpful.

That's great. We'd be happy to review a PR. We should be able to support arrays in the same manner too?

My only concern would be making this backwards compatible with earlier versions of dbplyr. As long as it does not create issues in that respect, it would be a good addition.

  1. Yes, this should also support arrays.

  2. Earlier versions of dbplyr will attempt to evaluate [[ locally, resulting in an error like this:

    x %>% mutate(y = map_col[["key"]])
    #> Error in eval_bare(call, env) : object 'map_col' not found

    So, if a user has an older version of dbplyr and a version of RPresto with the proposed change, they should get an error like the above (confirmed with dbplyr 1.3.0). This is the same error they'd get with an older dbplyr and RPresto 1.3.2, i.e., without the proposed change, so this seems reasonable. Does this address your concern about backward compatibility?

  3. Is there a preference for [[ versus [ or even $? The dbplyr commit referenced above supports remote evaluation of all three, so conceivably all could be translated the same way.

Re 2, yes, I'd say that's good enough.

Re 3, I'd say [[ is a good start. $ has different semantics like partial matching so I'd prefer not to use it at all. I'm less certain about [, let's keep that as a potential future improvement.

Great. I've got the code written and will wait for dbplyr 1.4.0 to hit CRAN (could be this week) before re-running the test suite and submitting the PR.

Addressed by #110.