linkedin / spark-tfrecord

Read and write Tensorflow TFRecord data from Apache Spark.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ArrayType(ArrayType(DoubleType,true),true) Not Supported When Writing Dataframe to TFRecords

xemcerk opened this issue · comments

Hi, I try to write my dataframe to tfrecords but encounter the error, log is as below

Caused by: java.lang.RuntimeException: Cannot convert field to unsupported data type ArrayType(ArrayType(DoubleType,true),true) at org.tensorflow.spark.datasources.tfrecords.serde.DefaultTfRecordRowEncoder$.org$tensorflow$spark$datasources$tfrecords$serde$DefaultTfRecordRowEncoder$$encodeFeature(DefaultTfRecordRowEncoder.scala:144) at org.tensorflow.spark.datasources.tfrecords.serde.DefaultTfRecordRowEncoder$$anonfun$encodeExample$1.apply(DefaultTfRecordRowEncoder.scala:64) at org.tensorflow.spark.datasources.tfrecords.serde.DefaultTfRecordRowEncoder$$anonfun$encodeExample$1.apply(DefaultTfRecordRowEncoder.scala:61) at scala.collection.immutable.List.foreach(List.scala:392) at org.tensorflow.spark.datasources.tfrecords.serde.DefaultTfRecordRowEncoder$.encodeExample(DefaultTfRecordRowEncoder.scala:61) at org.tensorflow.spark.datasources.tfrecords.DefaultSource$$anonfun$2.apply(DefaultSource.scala:59) at org.tensorflow.spark.datasources.tfrecords.DefaultSource$$anonfun$2.apply(DefaultSource.scala:56) at scala.collection.Iterator$$anon$11.next(Iterator.scala:410) at org.apache.spark.internal.io.SparkHadoopWriter$$anonfun$4.apply(SparkHadoopWriter.scala:129) at org.apache.spark.internal.io.SparkHadoopWriter$$anonfun$4.apply(SparkHadoopWriter.scala:127) at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1394) at org.apache.spark.internal.io.SparkHadoopWriter$.org$apache$spark$internal$io$SparkHadoopWriter$$executeTask(SparkHadoopWriter.scala:139) ... 10 more

I presume such feature is already supported though it's not specifically addressed in README

I have never tested ArrayType(ArrayType(DoubleType). I think FloatType should work.
You can try to cast to Float first.
Make sure you use SequenceExamples.

Thank you for your timely reply, I will try then. Now my work around is to save it as string, and parse it with tf.strings functions,seems not a big overhead so far.