tensorflow / io

Dataset, streaming, and file system extensions maintained by TensorFlow SIG-IO

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[Question] Reading parquet files with numpy arrays in columns as a tfio.IODataset results in error

pratikgujjar opened this issue · comments

I have been trying to read a parquet file that contains numpy array in its columns as a tfio dataset using the from_parquet() API. But this results in an error:

W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at strided_slice_op.cc:105 : INVALID_ARGUMENT: slice index 0 of dimension 0 out of bounds.
*** tensorflow.python.framework.errors_impl.InvalidArgumentError: {{function_node __wrapped__StridedSlice_device_/job:localhost/replica:0/task:0/device:CPU:0}} slice index 0 of dimension 0 out of bounds. [Op:StridedSlice] name: IOFromParquet/ParquetIODataset/strided_slice/

Here's a minimum working example to reproduce this error:

import pandas as pd
import numpy as np
import tensorflow_io as tfio

# Create parquet file
x = [np.asarray([i, i+1]) for i in range(5)]
df = pd.DataFrame({ "pred": x })
df.to_parquet("dfx.parquet")

# Create dataset
column_spec = {"pred": tf.TensorSpec(tf.TensorShape([]), tf.int64)}
ds = tfio.IODataset.from_parquet("dfx.parquet", columns=column_spec)