sparklyr / sparklyr

R interface for Apache Spark

Home Page:https://spark.rstudio.com/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Posit Workbench keeps crashing when connecting to Spark in local mode

tweakyTweeter opened this issue · comments

Posit Workbench keeps crashing when trying to connect to Spark in local mode via sparklyr package. Expected output is to be able to connect to a spark instance. When I try to run spark_connect using method = "test" option, I get an error with respect to as_tibble function as shown below. I tried downgrading various packages such as sparklyr, tibble, dplyr etc. but nothing seems to work. Would really appreciate if anyone has any suggestions to diagnose this issue as I'm drawing a blank and couldn't find any suggestions on Stakoverflow.

library(sparklyr)
#> 
#> Attaching package: 'sparklyr'
#> The following object is masked from 'package:stats':
#> 
#>     filter
devtools::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 3.6.3 (2020-02-29)
#>  os       Ubuntu 18.04.6 LTS
#>  system   x86_64, linux-gnu
#>  ui       X11
#>  language (EN)
#>  collate  en_US.UTF-8
#>  ctype    en_US.UTF-8
#>  tz       Etc/GMT
#>  date     2023-08-30
#>  pandoc   2.19.2 @ /usr/lib/rstudio-server/bin/quarto/bin/tools/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version date (UTC) lib source
#>  assertthat    0.2.1   2019-03-21 [1] CRAN (R 3.6.3)
#>  base64enc     0.1-3   2015-07-28 [1] CRAN (R 3.6.3)
#>  cachem        1.0.8   2023-05-01 [1] CRAN (R 3.6.3)
#>  callr         3.7.3   2022-11-02 [1] CRAN (R 3.6.3)
#>  cli           3.6.1   2023-03-23 [1] CRAN (R 3.6.3)
#>  crayon        1.5.2   2022-09-29 [1] CRAN (R 3.6.3)
#>  DBI           1.1.3   2022-06-18 [1] CRAN (R 3.6.3)
#>  dbplyr        2.2.1   2022-06-27 [1] CRAN (R 3.6.3)
#>  devtools      2.4.5   2022-10-11 [1] CRAN (R 3.6.3)
#>  digest        0.6.33  2023-07-07 [1] CRAN (R 3.6.3)
#>  dplyr         1.1.2   2023-04-20 [1] CRAN (R 3.6.3)
#>  ellipsis      0.3.2   2021-04-29 [1] CRAN (R 3.6.3)
#>  evaluate      0.21    2023-05-05 [1] CRAN (R 3.6.3)
#>  fansi         1.0.4   2023-01-22 [1] CRAN (R 3.6.3)
#>  fastmap       1.1.1   2023-02-24 [1] CRAN (R 3.6.3)
#>  fs            1.6.3   2023-07-20 [1] CRAN (R 3.6.3)
#>  generics      0.1.3   2022-07-05 [1] CRAN (R 3.6.3)
#>  glue          1.6.2   2022-02-24 [1] CRAN (R 3.6.3)
#>  htmltools     0.5.6   2023-08-10 [1] CRAN (R 3.6.3)
#>  htmlwidgets   1.6.2   2023-03-17 [1] CRAN (R 3.6.3)
#>  httpuv        1.6.11  2023-05-11 [1] CRAN (R 3.6.3)
#>  httr          1.4.7   2023-08-15 [1] CRAN (R 3.6.3)
#>  jsonlite      1.8.4   2022-12-06 [1] CRAN (R 3.6.3)
#>  knitr         1.43    2023-05-25 [1] CRAN (R 3.6.3)
#>  later         1.3.1   2023-05-02 [1] CRAN (R 3.6.3)
#>  lifecycle     1.0.3   2022-10-07 [1] CRAN (R 3.6.3)
#>  magrittr      2.0.3   2022-03-30 [1] CRAN (R 3.6.3)
#>  memoise       2.0.1   2021-11-26 [1] CRAN (R 3.6.3)
#>  mime          0.12    2021-09-28 [1] CRAN (R 3.6.3)
#>  miniUI        0.1.1.1 2018-05-18 [1] CRAN (R 3.6.3)
#>  pillar        1.9.0   2023-03-22 [1] CRAN (R 3.6.3)
#>  pkgbuild      1.4.2   2023-06-26 [1] CRAN (R 3.6.3)
#>  pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 3.6.3)
#>  pkgload       1.3.2.1 2023-07-08 [1] CRAN (R 3.6.3)
#>  prettyunits   1.1.1   2020-01-24 [1] CRAN (R 3.6.3)
#>  processx      3.8.2   2023-06-30 [1] CRAN (R 3.6.3)
#>  profvis       0.3.8   2023-05-02 [1] CRAN (R 3.6.3)
#>  promises      1.2.1   2023-08-10 [1] CRAN (R 3.6.3)
#>  ps            1.7.5   2023-04-18 [1] CRAN (R 3.6.3)
#>  purrr         1.0.2   2023-08-10 [1] CRAN (R 3.6.3)
#>  R.methodsS3   1.8.2   2022-06-13 [1] CRAN (R 3.6.3)
#>  R.oo          1.25.0  2022-06-12 [1] CRAN (R 3.6.3)
#>  R.utils       2.12.2  2022-11-11 [1] CRAN (R 3.6.3)
#>  R6            2.5.1   2021-08-19 [1] CRAN (R 3.6.3)
#>  Rcpp          1.0.11  2023-07-06 [1] CRAN (R 3.6.3)
#>  remotes       2.4.2.1 2023-07-18 [1] CRAN (R 3.6.3)
#>  reprex        2.0.2   2022-08-17 [1] CRAN (R 3.6.3)
#>  rlang         1.1.1   2023-04-28 [1] CRAN (R 3.6.3)
#>  rmarkdown     2.24    2023-08-14 [1] CRAN (R 3.6.3)
#>  rstudioapi    0.15.0  2023-07-07 [1] CRAN (R 3.6.3)
#>  sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 3.6.3)
#>  shiny         1.7.5   2023-08-12 [1] CRAN (R 3.6.3)
#>  sparklyr    * 1.8.2   2023-07-01 [1] CRAN (R 3.6.3)
#>  stringi       1.7.12  2023-01-11 [1] CRAN (R 3.6.3)
#>  stringr       1.5.0   2022-12-02 [1] CRAN (R 3.6.3)
#>  tibble        3.2.1   2023-03-20 [1] CRAN (R 3.6.3)
#>  tidyr         1.2.1   2022-09-08 [1] CRAN (R 3.6.3)
#>  tidyselect    1.2.0   2022-10-10 [1] CRAN (R 3.6.3)
#>  urlchecker    1.0.1   2021-11-30 [1] CRAN (R 3.6.3)
#>  usethis       2.2.2   2023-07-06 [1] CRAN (R 3.6.3)
#>  utf8          1.2.3   2023-01-31 [1] CRAN (R 3.6.3)
#>  vctrs         0.6.3   2023-06-14 [1] CRAN (R 3.6.3)
#>  withr         2.5.0   2022-03-03 [1] CRAN (R 3.6.3)
#>  xfun          0.40    2023-08-09 [1] CRAN (R 3.6.3)
#>  xtable        1.8-4   2019-04-21 [1] CRAN (R 3.6.3)
#>  yaml          2.3.7   2023-01-23 [1] CRAN (R 3.6.3)
#> 
#>  [1] /usr/local/lib/remote_cran_repo/r_shared_libraries/R3.6
#>  [2] /usr/local/lib/h2o/h2o-3.14.0.6
#>  [3] /usr/local/lib/h2o/h2o-3.16.0.2
#>  [4] /usr/local/lib/h2o/h2o-3.20.0.2
#>  [5] /usr/local/lib/R/3.6.3/lib/R/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────
sc <- sparklyr::spark_connect(master = "local", method = "test")
rlang::last_trace(drop = FALSE) 
#> <error/tibble_error_column_scalar_type>
#> Error in `as_tibble()`:
#> ! All columns in a tibble must be vectors.
#> ✖ Column `list(master = "local[40]", config = list(spark.env.SPARK_LOCAL_IP.local = "127.0.0.1", sparklyr.connect.csv.embedded = "^1.*", spark.sql.legacy.utcTimestampFunc.enabled = TRUE,
#>   sparklyr.connect.cores.local = 40, spark.sql.shuffle.partitions.local = 40, sparklyr.shell.name = "sparklyr", \`sparklyr.shell.driver-memory\` = "2g"), state = <environment>)` is a
#>   `spark_connection/test_connection/DBIConnection` object.
#> ---
#> Backtrace:
#>      ▆
#>   1. └─.rs.connectionListObjects("Spark", "local - ")
#>   2.   └─connection$listObjects(...)
#>   3.     └─sparklyr:::connection_list_tables(scon, includeType = TRUE)
#>   4.       ├─base::sort(dbListTables(sc))
#>   5.       ├─DBI::dbListTables(sc)
#>   6.       └─sparklyr (local) dbListTables(sc)
#>   7.         └─sparklyr (local) .local(conn, ...)
#>   8.           └─sparklyr:::df_from_sql(conn, query)
#>   9.             └─sparklyr:::df_from_sdf(sc, sdf)
#>  10.               └─sparklyr::sdf_collect(sdf)
#>  11.                 └─sparklyr:::sdf_collect_static(object, impl, ...)
#>  12.                   └─sparklyr:::sdf_collect_data_frame(sdf, collected)
#>  13.                     ├─tibble::as_tibble(fixed, stringsAsFactors = FALSE, optional = TRUE)
#>  14.                     └─tibble:::as_tibble.list(fixed, stringsAsFactors = FALSE, optional = TRUE)
#>  15.                       └─tibble:::lst_to_tibble(x, .rows, .name_repair, col_lengths(x))
#>  16.                         └─tibble:::check_valid_cols(x, call = call)
#>  17.                           └─tibble:::abort_column_scalar_type(...)
#>  18.                             └─tibble:::tibble_abort(...)
#>  19.                               └─rlang::abort(x, class, ..., call = call, parent = parent, use_cli_format = TRUE)
#> 

Hi, what is the reason to use method = "test" in your use-case? Wouldn't simply using spark_connect("local") be sufficient?

If I use spark_connect("local") RStudio instantly crashes and no error logs are generated for me to debug the issue. So I was trying with method = "test"

Ok, what kind of error message is Workbench displaying?

It just crashes the session without any error messages. Let me try with sparkly.log.console option and check if I can get any error messages.

Even with the options(sparklyr.log.console = TRUE) command, the R session instantly crashes.