sparklyr / sparklyr

R interface for Apache Spark

Home Page:https://spark.rstudio.com/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

sdf_len() failing in Databricks Notebook, DBR 14.3 LTS

RafiKurlansik opened this issue · comments

When using Databricks notebooks with Databricks Runtime 14.3 LTS and the latest versions of sparklyr, vctrs, the sdf_len() function doesn't work.

install.packages('vctrs') # vctrs 0.6.5
install.packages('sparklyr') # sparklyr 1.8.4

library(sparklyr)
sc <- spark_connect(method = "databricks") 
sdf_len(sc, 10) 
Error in db_query_fields.DBIConnection(con, ...) : Can't query fields.
Caused by error in `value[[3L]]()`:
! Failed to fetch data: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 3.0 failed 4 times, most recent failure: Lost task 0.3 in stage 3.0 (TID 12) (10.0.25.28 executor 1): org.apache.spark.SparkException: Failed to fetch spark://10.0.27.246:46771/files/packages.d9682e7c-e06f-11ee-a3c8-00163e48929a.tar during dependency update
Error in `db_query_fields.DBIConnection()`:
Error in `db_query_fields.DBIConnection()`:
! Can't query fields.
Caused by error in `value[[3L]]()`:
! Failed to fetch data: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 3.0 failed 4 times, most recent failure: Lost task 0.3 in stage 3.0 (TID 12) (10.0.25.28 executor 1): org.apache.spark.SparkException: Failed to fetch spark://10.0.27.246:46771/files/packages.d9682e7c-e06f-11ee-a3c8-00163e48929a.tar during dependency update

Session info:

Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.4 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so;  LAPACK version 3.10.0

locale:
 [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
 [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
 [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
[10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   

time zone: Etc/UTC
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] sparklyr_1.8.4

loaded via a namespace (and not attached):
 [1] vctrs_0.6.5       httr_1.4.6        cli_3.6.1         rlang_1.1.1      
 [5] DBI_1.1.3         purrr_1.0.1       generics_0.1.3    jsonlite_1.8.7   
 [9] SparkR_3.5.0      glue_1.6.2        openssl_2.0.6     dbplyr_2.3.3     
[13] askpass_1.1       fansi_1.0.4       tibble_3.2.1      yaml_2.3.7       
[17] lifecycle_1.0.3   config_0.3.1      Rserve_1.8-11     compiler_4.3.1   
[21] dplyr_1.1.2       pkgconfig_2.0.3   tidyr_1.3.0       rstudioapi_0.15.0
[25] R6_2.5.1          tidyselect_1.2.0  utf8_1.2.3        parallel_4.3.1   
[29] pillar_1.9.0      magrittr_2.0.3    withr_2.5.0       uuid_1.1-0       
[33] tools_4.3.1        

Same issue occurs in two other scenarios:

  1. Attempting to run this code withsparklyr 1.8.1 and vctrs 0.6.3, which are the versions that ship with DBR 14.3
  2. Attempting to run this code with sparklyr 1.8.4 and vctrs 0.6.5 in DBR 13.3.
R version 4.3.1 (2023-06-16)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.4 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so;  LAPACK version 3.10.0

locale:
 [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
 [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
 [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
[10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   

time zone: Etc/UTC
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] sparklyr_1.8.1

loaded via a namespace (and not attached):
 [1] jsonlite_1.8.7    dplyr_1.1.2       compiler_4.3.1    tidyselect_1.2.0 
 [5] parallel_4.3.1    assertthat_0.2.1  tidyr_1.3.0       uuid_1.1-0       
 [9] yaml_2.3.7        fastmap_1.1.1     Rserve_1.8-11     R6_2.5.1         
[13] generics_0.1.3    htmlwidgets_1.6.2 tibble_3.2.1      SparkR_3.5.0     
[17] rprojroot_2.0.3   DBI_1.1.3         pillar_1.9.0      rlang_1.1.1      
[21] utf8_1.2.3        config_0.3.1      r2d3_0.2.6        forge_0.2.0      
[25] cli_3.6.1         withr_2.5.0       magrittr_2.0.3    digest_0.6.33    
[29] rstudioapi_0.15.0 dbplyr_2.3.3      base64enc_0.1-3   lifecycle_1.0.3  
[33] vctrs_0.6.3       glue_1.6.2        fansi_1.0.4       purrr_1.0.1      
[37] httr_1.4.6        tools_4.3.1       pkgconfig_2.0.3   ellipsis_0.3.2   
[41] htmltools_0.5.5  

Tested on a 14.3 cluster, and it worked for me. Are you on a shared or individual cluster?

Screenshot 2024-03-20 at 2 39 59 PM

Automatically closed because there has not been a response for 30 days. When you're ready to work on this further, please comment here and the issue will automatically reopen.