tidyverse / dbplyr

Database (DBI) backend for dplyr

Home Page:https://dbplyr.tidyverse.org

Repository from Github https://github.comtidyverse/dbplyrRepository from Github https://github.comtidyverse/dbplyr

MSSQL - pivot_wider - BIT - ERROR

FlorianSchwendinger opened this issue · comments

Using pivot_wider on a tbl connected to a MSSQL database gives an error, since MAX is not allowed on BIT.

Versions

  • R version 4.4.1
  • dbplyr_2.5.0
  • dplyr_1.1.4
  • DBI_1.2.3
  • tidyverse_2.0.0
  • Microsoft SQL Server 2014 (SP3-CU4-GDR) (KB5029185) - 12.0.6449.1 (X64)

Example

library("DBI")
library("dbplyr")
library("tidyverse")

uid <- c(1L, 1L, 2L, 3L, 3L, 3L)
idx <- c(1L, 2L, 1L, 1L, 2L, 3L)
num <- rnorm(length(idx))
bit <- rnorm(length(idx)) > 0

df <- tibble(uid = uid, idx = idx, num = num, bit = bit)
df

so far everything works as expected. We connect to the server and the tbl

db <- dbConnect(odbc::odbc(),
                driver = "{SQL Server}",
                server = "",
                database = "",
                trusted_connection = "yes",
                encoding = "latin1")

writeLines(unlist(dbGetQuery(db, "SELECT @@VERSION")))
#R> Microsoft SQL Server 2014 (SP3-CU4-GDR) (KB5029185) - 12.0.6449.1 (X64) 
#R>         Jul 27 2023 21:55:46

dbWriteTable(conn = db, name = "temp_test_wide", value = df, overwrite = TRUE)
dbf <- tbl(db, "temp_test_wide")

Now executing pivot_wider gives an error on the bit column.

values_from <- setdiff(colnames(dbf), c("uid", "idx"))
pivot_wider(dbf, id_cols = "uid", names_from = "idx", values_from=all_of(values_from))
#R> Error in `collect()`:
#R> ! Failed to collect lazy table.
#R> Caused by error in `<fn>`:
#R> ! ODBC failed with error 42000 from [Microsoft][ODBC SQL Server Driver][SQL Server].
#R> ✖ Operand data type bit is invalid for max operator.
#R> • Statement(s) could not be prepared.
#R> • <SQL> 'SELECT TOP 11
#R> •  "uid",
#R> •  MAX(IIF("idx" = 1, "num", NULL)) AS "num_1",
#R> •  MAX(IIF("idx" = 2, "num", NULL)) AS "num_2",
#R> •  MAX(IIF("idx" = 3, "num", NULL)) AS "num_3",
#R> •  MAX(IIF("idx" = 1, "bit", NULL)) AS "bit_1",
#R> •  MAX(IIF("idx" = 2, "bit", NULL)) AS "bit_2",
#R> •  MAX(IIF("idx" = 3, "bit", NULL)) AS "bit_3"
#R> • FROM "temp_test_wide"
#R> • GROUP BY "uid"'
#R> ℹ From nanodbc/nanodbc.cpp:1783.
#R> Run `rlang::last_trace()` to see where the error occurred.

Using show_query shows the source of the problem, the SQL generated looks like

SELECT
  "uid",
  MAX(IIF("idx" = 1, "num", NULL)) AS "num_1",
  MAX(IIF("idx" = 2, "num", NULL)) AS "num_2",
  MAX(IIF("idx" = 3, "num", NULL)) AS "num_3",
  MAX(IIF("idx" = 1, "bit", NULL)) AS "bit_1",
  MAX(IIF("idx" = 2, "bit", NULL)) AS "bit_2",
  MAX(IIF("idx" = 3, "bit", NULL)) AS "bit_3"
FROM "temp_test_wide"
GROUP BY "uid"

but should be

SELECT
  "uid",
  MAX(IIF("idx" = 1, "num", NULL)) AS "num_1",
  MAX(IIF("idx" = 2, "num", NULL)) AS "num_2",
  MAX(IIF("idx" = 3, "num", NULL)) AS "num_3",
  CAST(MAX(CAST(IIF("idx" = 1, "bit", NULL) as INT)) AS BIT) AS "bit_1",
  CAST(MAX(CAST(IIF("idx" = 2, "bit", NULL) as INT)) AS BIT) AS "bit_2",
  CAST(MAX(CAST(IIF("idx" = 3, "bit", NULL) as INT)) AS BIT) AS "bit_3"
FROM "temp_test_wide"
GROUP BY "uid"

Looking at you code in backend-mssql.R I see you have a mssql_bit_int_bit function for MAX.
However for pivot_wider it seams to be missed.

Changing the argument values_fn to

values_fn <- list(
    num = ~ max(.x, na.rm = TRUE),
    bit = ~ as.logical(max(as.integer(.x), na.rm = TRUE))
)

resolves the problem.

Hope this helps.