spark_apply() error, "Can't query fields"
brianstamper opened this issue · comments
Attempting to use spark_apply but getting an error. This is on a Cloudera Machine Learning environment, with the same error appearing in either Spark 2.4.7 or 3.2.1. On another machine with a local Spark install I do not see this issue, but the error I'm getting here is not giving much indication what the problem is.
For a small demo I'll use the first spark_apply()
example from sparklyr - Distributing R Computations
library(tidyverse)
library(sparklyr)
conf <- spark_config()
sc <- spark_connect(master = 'yarn-client', config = conf)
sdf_len(sc, 5, repartition = 1) %>%
spark_apply(function(e) I(e))
And the error I get looks like the following. It does make me wonder if this is actually a dbplyr
issue instead.
Error in `db_query_fields.DBIConnection()`:
! Can't query fields.
Caused by error in `value[[3L]]()`:
! Failed to fetch data: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 588.0 failed 4 times, most recent failure: Lost task 0.3 in stage 588.0 (TID 57382) (10.42.5.51 executor 112): org.apache.spark.SparkException: Process List(tar, -xf, packages.51962.tar) exited with code 2
Backtrace:
1. sdf_len(sc, 5, repartition = 1) %>% ...
4. sparklyr::sdf_len(sc, 5, repartition = 1)
5. sparklyr::sdf_seq(sc, 1, length, repartition = repartition, type = type)
7. sparklyr:::sdf_register.spark_jobj(sdf)
9. sparklyr:::tbl.spark_connection(sc, name)
10. sparklyr:::spark_tbl_sql(src = src, from)
11. dbplyr::tbl_sql(...)
13. dbplyr:::dbplyr_query_fields(src$con, from)
14. dbplyr:::dbplyr_fallback(con, "db_query_fields", ...)
16. dbplyr:::db_query_fields.DBIConnection(con, ...)
Cross posting from https://community.rstudio.com/t/using-spark-apply-throws-a-sparkexception-process-list-tar-xf-packages-75547-tar-exited-with-code-2/180555