databricks / koalas

Koalas: pandas API on Apache Spark

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Koalas on JDK 11 raises `java.lang.UnsupportedOperationException`

ashwin153 opened this issue · comments

  • PyArrow + PySpark on JDK 11 raises java.lang.UnsupportedOperationException: sun.misc.Unsafe or java.nio.DirectByteBuffer.<init>(long, int) not available.
  • According to https://stackoverflow.com/a/62625252 this can be resolved by setting spark.driver.extraJavaOptions="-Dio.netty.tryReflectionSetAccessible=true" and
    spark.executor.extraJavaOptions="-Dio.netty.tryReflectionSetAccessible=true" in the SparkContext. I reproduced the problem and verified this solution on Spark / PySpark 3.0.2, Koalas 1.5.0, and openjdk 11.0.10.
  • Because I would imagine this to be a relatively common configuration (Java 11 is the default-jdk on Ubuntu 20.04 LTS, and Spark 3 is the latest version), I propose adding this configuration to the default_session. If there is a way to detect the JDK version from Python, then this additional configuration could be conditionally applied depending on the impacted LooseVersion(pyarrow.__version__), LooseVersion(pyspark.__version__), and JDK versions.

Sure, that sounds making sense.

Do you know how to get the JDK version? If so, I'd be happy to put up a PR for this change.