Writing TFRecords breaks in Spark 3.2.0
M-Anwar opened this issue · comments
When using the library (com.linkedin.sparktfrecord:spark-tfrecord_2.12:0.3.0) in Spark 3.2.0 to write tfrecords the job throws an exception:
Caused by: java.lang.AbstractMethodError: Receiver class com.linkedin.spark.datasources.tfrecord.TFRecordOutputWriter does not define or inherit an implementation of the resolved method 'abstract java.lang.String path()' of abstract class org.apache.spark.sql.execution.datasources.OutputWriter.
I think this is caused by the change SPARK-26164, which modifies the OutputWriter
class to include a path(): String
method (source).
The current TFRecordOutputWriter
class doesn't have this method, and hence the error (source)
Thanks for reporting the issue.
It looks like this is a breaking change on Spark side.
Do you have a solution already? Your contribution is highly appreciated as we don't have much bandwidth on this project at this moment.
fix: #37
The solution looks good to me, verified that it works on Spark 3.2.0. Thanks @tangyl for the PR!