databricks / koalas

Koalas: pandas API on Apache Spark

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

to_datetime() error

franperezlopez opened this issue · comments

I'm using the method koalas.to_datetime() to cast a string as a datetime. This is the code:

df = ks.DataFrame({'timestamp': ['2020-04-06', '2020-04-06']})
df.timestamp = ks.to_datetime(df.timestamp)
df.to_pandas()

executing the second line, you get this warning:

/home/fran/anaconda3/envs/----/lib/python3.7/site-packages/pyspark/sql/pandas/functions.py:386: UserWarning: In Python 3.6+ and Spark 3.0+, it is preferred to specify type hints for pandas UDF instead of specifying pandas UDF type which will be deprecated in the future releases. See SPARK-28264 for more details.
  "in the future releases. See SPARK-28264 for more details.", UserWarning)

executing the third line (to_pandas()), an exception is thrown ... is this a bug or should I change the way of using to_datetime()??

Py4JJavaError: An error occurred while calling o153.collectToPython.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 3 in stage 0.0 failed 1 times, most recent failure: Lost task 3.0 in stage 0.0 (TID 3, 172.28.1.237, executor driver): java.lang.UnsupportedOperationException: sun.misc.Unsafe or java.nio.DirectByteBuffer.<init>(long, int) not available

The error:

sun.misc.Unsafe or java.nio.DirectByteBuffer.<init>(long, int) not available

likely from the JVM version and Arrow issue. You will have to add -Dio.netty.tryReflectionSetAccessible=true. See also https://spark.apache.org/docs/3.0.0/index.html#downloading

Any updates here? I am seeing this issue as well.