linkedin / spark-tfrecord

Read and write Tensorflow TFRecord data from Apache Spark.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Protocol message tag had invalid wire type.

ak2911 opened this issue · comments

error while reading tfrecord. link to tfrecord file

kindly suggest, if it need any manual schema creation or setting for each tfrecord.


df = spark.read.format("tfrecord").option("recordType", "Example").load('tfrecordFile')

ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 2)
com.linkedin.spark.shaded.com.google.protobuf.InvalidProtocolBufferException$InvalidWireTypeException: Protocol message tag had invalid wire type.

The error came from google protobuf. Were you able to load the file with other tools, such as native tensorflow dataset api?

Thanks JunShi for your reply.
Yes, I am able to load/access it using Tensorflow own api.
Getting error while using spark-tfrecord api.

Any probable reason for this error or do I need to specify any parameter before loading new tfrecord?

How was your file generated?

I googled the error, most likely you had a corrupted file.
for example:
https://stackoverflow.com/questions/6138721/protobuf-errorprotocol-message-tag-had-invalid-wire-type

I am puzzled that tensorflow api can handle it.

Thanks JunShi, but it is working properly with tf data api. you can check yourself. link to tfrecord file.

Have also checked the above stackoverflow link. error seems to be generic one. Need any suggestion from spark-tfrecord dev team.

Can you provide a smaller file? say less than 1M? The file you provided is about 1G.