AttributeError: type object 'InternalFrame' has no attribute 'restore_index'
RainFung opened this issue · comments
rain commented
UserWarning: toPandas attempted Arrow optimization because 'spark.sql.execution.arrow.enabled' is set to true, but has reached the error below and can not continue. Note that 'spark.sql.execution.arrow.fallback.enabled' does not have an effect on failures in the middle of computation.
An error occurred while calling o59044.getResult.
: org.apache.spark.SparkException: Exception thrown in awaitResult:
at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:226)
at org.apache.spark.api.python.PythonServer.getResult(PythonRDD.scala:874)
at org.apache.spark.api.python.PythonServer.getResult(PythonRDD.scala:870)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 2 in stage 19.0 failed 4 times, most recent failure: Lost task 2.3 in stage 19.0 (TID 14390, 11.0.109.187, executor 149): org.apache.spark.api.python.PythonException: Traceback (most recent call last):
File "/data4/yarnenv/local/usercache/tdw_rainyrfeng/appcache/application_1619753973429_9236975/container_e04_1619753973429_9236975_01_000480/pyspark.zip/pyspark/worker.py", line 377, in main
process()
File "/data4/yarnenv/local/usercache/tdw_rainyrfeng/appcache/application_1619753973429_9236975/container_e04_1619753973429_9236975_01_000480/pyspark.zip/pyspark/worker.py", line 372, in process
serializer.dump_stream(func(split_index, iterator), outfile)
File "/data4/yarnenv/local/usercache/tdw_rainyrfeng/appcache/application_1619753973429_9236975/container_e04_1619753973429_9236975_01_000480/pyspark.zip/pyspark/serializers.py", line 290, in dump_stream
for series in iterator:
File "<string>", line 1, in <lambda>
File "/data4/yarnenv/local/usercache/tdw_rainyrfeng/appcache/application_1619753973429_9236975/container_e04_1619753973429_9236975_01_000480/pyspark.zip/pyspark/worker.py", line 101, in <lambda>
return lambda *a: (verify_result_length(*a), arrow_return_type)
File "/data4/yarnenv/local/usercache/tdw_rainyrfeng/appcache/application_1619753973429_9236975/container_e04_1619753973429_9236975_01_000480/pyspark.zip/pyspark/worker.py", line 92, in verify_result_length
result = f(*a)
File "/data4/yarnenv/local/usercache/tdw_rainyrfeng/appcache/application_1619753973429_9236975/container_e04_1619753973429_9236975_01_000480/pyspark.zip/pyspark/util.py", line 99, in wrapper
return f(*args, **kwargs)
File "/usr/local/lib/python3.6/site-packages/databricks/koalas/accessors.py", line 919, in <lambda>
File "/usr/local/lib/python3.6/site-packages/databricks/koalas/groupby.py", line 1375, in rename_output
AttributeError: type object 'InternalFrame' has no attribute 'restore_index'
Hyukjin Kwon commented
@RainFung would you mind sharing the codes you run and the Koalas version?
Sbargaoui commented
I found the origin of the issue.
When bumping koalas to 1.8.0, one of the worker nodes was still using koalas 1.5.0, which didn't introduce restore_index
as method in InternalFrame
(https://github.com/databricks/koalas/blob/master/databricks/koalas/internal.py) until version 1.8.0.
Making sure that every node uses the same version should fix it.