tidyverse / dbplyr

Database (DBI) backend for dplyr

Home Page:https://dbplyr.tidyverse.org

Repository from Github https://github.comtidyverse/dbplyrRepository from Github https://github.comtidyverse/dbplyr

How do I get support for dotted names (i.e. with period)?

MichaelChirico opened this issue · comments

I'm trying to write (well, re-write) my custom backend, and finally have most of the steps in place.

However, I'm stuck at getting joins working:

library(DBI)
library(dplyr)
library(dbplyr)
library(RMyBackend)

conn = dbConnect(MyBackend(), ...)

conn |>
  dbWriteTable(
    "mtcars",
    mtcars,
    ...
  )

mtcars_remote <- tbl(conn, "mtcars")

mtcars_subset <- mtcars_remote %>%
  filter(cyl == 8) %>%
  select(mpg:drat)

# separating build & generate (are those the terms?) steps
jn = mtcars_remote |>
  left_join(mtcars_subset, by = 'cyl')

# execution fails
jn
INVALID_ARGUMENT: MyBackend::SQL_ANALYSIS_ERROR: Syntax error: Expected ")" but got "." [at 4:28]
    `mtcars`.`mpg` AS `mpg`.`x`,
                           ^ [util.ErrorSpacePayload='MyBackend::SQL_ANALYSIS_ERROR']
Error in `collect()`:
! Failed to collect lazy table.
Caused by error in `doTryCatch()`:
! INVALID_ARGUMENT: MyBackend::SQL_ANALYSIS_ERROR: Syntax error: Expected ")" but got "." [at 4:28]
    `mtcars`.`mpg` AS `mpg`.`x`,
                           ^ [util.ErrorSpacePayload='MyBackend::SQL_ANALYSIS_ERROR']
Run `rlang::last_trace()` to see where the error occurred.

Actually, the problem is simpler -- names with . are not correctly quoted:

mtcars_subset |>
  mutate(dotted.name = drat)

We can see that everything looks OK except the AS alias is not being quoted correctly:

mtcars_subset |>
  mutate(dotted.name = drat) |>
  show_query()
<SQL>
# SELECT `mpg`, `cyl`, `disp`, `hp`, `drat`, `drat` AS `dotted`.`name`
# FROM `mtcars`
# WHERE (`cyl` = 8.0)

However, I'm totally lost as to where in the query generation stack I should be looking to fix this issue.

At root, `dotted`.`name` is the output of dbQuoteIdentifier(conn, "dotted.name"), which is correct for identifiers used in namespaced table names, e.g. if this was SELECT * FROM fully.qualified.table it would be fine to robustly quote the name as SELECT * FROM `fully`.`qualified`.`table` . So partly I'm not sure how to instruct {dbplyr} to quote table names differently from variable names.

Finally in the course of writing this, I've come across sql_join_suffix() which will help here (this should be mentioned in the custom backend vignette), though I don't understand how it can be "." on generic DBI connections.

Joins aside, the question of how I'd support dot-named columns remains. Even manually quoting fails:

remote |>
  mutate("`dotted.name`" = drat) |>
  show_query()
# <SQL>
# SELECT `mpg`, `cyl`, `disp`, `hp`, `drat`, `drat` AS `\`dotted`.`name\``
# FROM `chiricom`.`mtcars`
# WHERE (`cyl` = 8.0)

Hmm, maybe this is indeed just a problem with our dbQuoteIdentifier() method, which manually replaces . with `.`.

I guess the intention is that at other call sites, fully-qualified table names are meant to have their individual components quoted, rather than the full name be quoted in this way.

Still, this issue was not caught by {DBItest}, should it have been?