java.lang.UnsupportedOperationException: buildReader is not supported for TFRECORD
ZixinChen0520 opened this issue · comments
Hi @junshi15
I met UnsupportedOperationException with version: Scala 2.12 and spark 3
This exception happens when I tried to show or transform the dataframe to rdd. It seems that the method 'buildReader' is not implemented.
My dependency:
<dependency>
<groupId>com.linkedin.sparktfrecord</groupId>
<artifactId>spark-tfrecord_2.12</artifactId>
<version>0.2.3</version>
</dependency>
The way I load my tfrecord:
sparkSession
.read
.format("tfrecord")
.options(config.options)
.option("recordType", "Example")
.load(myPath)
here is the exception:
java.lang.UnsupportedOperationException: buildReader is not supported for TFRECORD
at org.apache.spark.sql.execution.datasources.FileFormat.buildReader(FileFormat.scala:116)
at org.apache.spark.sql.execution.datasources.FileFormat.buildReaderWithPartitionValues(FileFormat.scala:137)
at org.apache.spark.sql.execution.FileSourceScanExec.inputRDD$lzycompute(DataSourceScanExec.scala:478)
at org.apache.spark.sql.execution.FileSourceScanExec.inputRDD(DataSourceScanExec.scala:468)
at org.apache.spark.sql.execution.FileSourceScanExec.doExecute(DataSourceScanExec.scala:553)
at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:180)
at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:218)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:215)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:176)
at org.apache.spark.sql.execution.SparkPlan.getByteArrayRdd(SparkPlan.scala:321)
at org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:387)
at org.apache.spark.sql.Dataset.$anonfun$collectToPython$1(Dataset.scala:3449)
at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3617)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:106)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:166)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:87)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:835)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3615)
at org.apache.spark.sql.Dataset.collectToPython(Dataset.scala:3446)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)
Not sure if I used the load method in a wrong way.
Thanks!
Do you see the same problem in Spark 2.3 or 2.4?
I try the following:
- Launch spark-shell (Spark 3.0.0)
bin/spark-shell --packages com.linkedin.sparktfrecord:spark-tfrecord_2.12:0.2.3
- Test the code here: https://github.com/linkedin/spark-tfrecord#use-partitionby
It worked for me.
The difference is that I did not use .options(config.options)
. As a test, I am wondering if you can remove that options and try it again.
BTW, buildReader is supported here.
I am wondering if you program actually loaded spark-tfrecord correctly.
I checked the code of your buildReader and our spark-buildReader. Looks like the variables of our spark-buildReader are changed a little bit. I think the problem can be solved if I delete those differences.
Thank you so much for your help!
Assume the problem has been resolved. Feel free to reopen it if not.