spark 3.1.1 Caused by: java.lang.ClassNotFoundException: tfrecords.DefaultSource
mullerhai opened this issue · comments
spark 3.1.1
Using Scala version 2.12.10 (Eclipse OpenJ9 VM, Java 11.0.10)
spark-tfrecord 0.3.4
libraryDependencies += "com.linkedin.sparktfrecord" %% "spark-tfrecord" % "0.3.4"
启动方式
spark-shell --jars /data/spark/jars/spark-tfrecord_2.12-0.3.4.jar
import org.apache.spark.sql.SaveMode
val caseFinalModelFeaturePath ="hdfs:///auth/data/model/salecase_warehouse/case_model_feature_snappy.parquet"
val finalInputDf = spark.read.parquet(caseFinalModelFeaturePath)
val caseFinalTFRecordPath ="file:///data/model/salecase_warehouse/case_model_tfrecord"
finalInputDf.coalesce(10).write.format("tfrecords").option("recordType", "Example")
.option("codec", "org.apache.hadoop.io.compress.GzipCodec")
.mode(SaveMode.Overwrite)
.save(caseFinalTFRecordPath)
meet error
java.lang.ClassNotFoundException: Failed to find data source: tfrecords. Please find packages at http://spark.apache.org/third-party-projects.html
java.lang.ClassNotFoundException: Failed to find data source: tfrecords. Please find packages at http://spark.apache.org/third-party-projects.html
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:689)
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSourceV2(DataSource.scala:743)
at org.apache.spark.sql.DataFrameWriter.lookupV2Provider(DataFrameWriter.scala:993)
at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:311)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:293)
... 58 elided
Caused by: java.lang.ClassNotFoundException: tfrecords.DefaultSource
at scala.reflect.internal.util.AbstractFileClassLoader.findClass(AbstractFileClassLoader.scala:72)
at java.base/java.lang.ClassLoader.loadClassHelper(ClassLoader.java:1185)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:1100)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:1083)
at org.apache.spark.sql.execution.datasources.DataSource$.$anonfun$lookupDataSource$5(DataSource.scala:663)
at org.apache.spark.sql.execution.datasources.DataSource$$$Lambda$7483/0x0000000000000000.apply(Unknown Source)
at scala.util.Try$.apply(Try.scala:213)
at org.apache.spark.sql.execution.datasources.DataSource$.$anonfun$lookupDataSource$4(DataSource.scala:663)
at org.apache.spark.sql.execution.datasources.DataSource$$$Lambda$4336/0x0000000000000000.apply(Unknown Source)
at scala.util.Failure.orElse(Try.scala:224)
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:663)
... 62 more
both use spark3.3 also get same error
It looks like you did not set the jar properly. Make sure this file is valid: /data/spark/jars/spark-tfrecord_2.12-0.3.4.jar
Or you can try pulling from maven central:
spark-shell --packages com.linkedin.sparktfrecord:spark-tfrecord_2.12:0.4.0
You need maven central repo access for this one to work.
It looks like you did not set the jar properly. Make sure this file is valid:
/data/spark/jars/spark-tfrecord_2.12-0.3.4.jar
Or you can try pulling from maven central:
spark-shell --packages com.linkedin.sparktfrecord:spark-tfrecord_2.12:0.4.0
You need maven central repo access for this one to work.
I found just we change the symbol word for write & read tfrecord,
old version :.write.format("tfrecords") ,
new version .write.format("tfrecord")
glad you figured it out.
glad you figured it out.
it is my pleasure