sparklyr / sparklyr

R interface for Apache Spark

Home Page:https://spark.rstudio.com/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

strange error with sparklyr 1.8.3 and dbplyr 2.3.4

joscani opened this issue · comments

Hi, I get a stranger error with sparklyr 1.8.3 and dbplyr 2.3.4 in spark cluster


> top_variables <-  sc %>% 
+   tbl("name_schema.table_name")
> top_variables_one_day <- top_variables %>% 
+   filter(complete_date == "2023-09-15")  %>% 
+   compute()
Error in `db_save_query.DBIConnection()`:
! Can't save query to "dbplyr_002".

But with dbplyr 2.3.3 and sparklyr 1.8.3 it works

> top_variables <-  sc %>% 
+    tbl("name_schema.table_name")
> top_variables_one_day <- top_variables %>% 
+   filter(complete_date == "2023-09-15")  %>% 
+   compute()

I need to make a reproducible example, but I think the problem is in dbplyr.

it works if I use tbl_cache,

 top_variables_one_day %>% sdf_register("tabla")
 sc %>% tbl_cache("tabla")
 

But I would like to use only %>% compute()

In dbplyr 2.3.3 compute works, in dbplyr 2.3.4 doesn't. With the same verion of sparklyr (1.8.3.) and same version of DBI (1.1.3)

I see issue in tidyverse/dbplyr#1372
Thanks

Hi @joscani , that's solved in the dev version of sparklyr, feel free to test it out. I will be sending a update to CRAN soon

Hi @edgararuiz , I'll try the dev version next week.
Thanks for your fast answer, as always