databricks / koalas

Koalas: pandas API on Apache Spark

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

AttributeError: type object 'InternalFrame' has no attribute 'restore_index'

RainFung opened this issue · comments

commented
 UserWarning: toPandas attempted Arrow optimization because 'spark.sql.execution.arrow.enabled' is set to true, but has reached the error below and can not continue. Note that 'spark.sql.execution.arrow.fallback.enabled' does not have an effect on failures in the middle of computation.
  An error occurred while calling o59044.getResult.
: org.apache.spark.SparkException: Exception thrown in awaitResult: 
	at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:226)
	at org.apache.spark.api.python.PythonServer.getResult(PythonRDD.scala:874)
	at org.apache.spark.api.python.PythonServer.getResult(PythonRDD.scala:870)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:282)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:238)
	at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 2 in stage 19.0 failed 4 times, most recent failure: Lost task 2.3 in stage 19.0 (TID 14390, 11.0.109.187, executor 149): org.apache.spark.api.python.PythonException: Traceback (most recent call last):
  File "/data4/yarnenv/local/usercache/tdw_rainyrfeng/appcache/application_1619753973429_9236975/container_e04_1619753973429_9236975_01_000480/pyspark.zip/pyspark/worker.py", line 377, in main
    process()
  File "/data4/yarnenv/local/usercache/tdw_rainyrfeng/appcache/application_1619753973429_9236975/container_e04_1619753973429_9236975_01_000480/pyspark.zip/pyspark/worker.py", line 372, in process
    serializer.dump_stream(func(split_index, iterator), outfile)
  File "/data4/yarnenv/local/usercache/tdw_rainyrfeng/appcache/application_1619753973429_9236975/container_e04_1619753973429_9236975_01_000480/pyspark.zip/pyspark/serializers.py", line 290, in dump_stream
    for series in iterator:
  File "<string>", line 1, in <lambda>
  File "/data4/yarnenv/local/usercache/tdw_rainyrfeng/appcache/application_1619753973429_9236975/container_e04_1619753973429_9236975_01_000480/pyspark.zip/pyspark/worker.py", line 101, in <lambda>
    return lambda *a: (verify_result_length(*a), arrow_return_type)
  File "/data4/yarnenv/local/usercache/tdw_rainyrfeng/appcache/application_1619753973429_9236975/container_e04_1619753973429_9236975_01_000480/pyspark.zip/pyspark/worker.py", line 92, in verify_result_length
    result = f(*a)
  File "/data4/yarnenv/local/usercache/tdw_rainyrfeng/appcache/application_1619753973429_9236975/container_e04_1619753973429_9236975_01_000480/pyspark.zip/pyspark/util.py", line 99, in wrapper
    return f(*args, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/databricks/koalas/accessors.py", line 919, in <lambda>
  File "/usr/local/lib/python3.6/site-packages/databricks/koalas/groupby.py", line 1375, in rename_output
AttributeError: type object 'InternalFrame' has no attribute 'restore_index'

@RainFung would you mind sharing the codes you run and the Koalas version?

I found the origin of the issue.

When bumping koalas to 1.8.0, one of the worker nodes was still using koalas 1.5.0, which didn't introduce restore_index as method in InternalFrame (https://github.com/databricks/koalas/blob/master/databricks/koalas/internal.py) until version 1.8.0.

Making sure that every node uses the same version should fix it.