[Question] Reading parquet files with numpy arrays in columns as a tfio.IODataset results in error
pratikgujjar opened this issue · comments
Pratik Gujjar commented
I have been trying to read a parquet file that contains numpy array in its columns as a tfio dataset using the from_parquet()
API. But this results in an error:
W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at strided_slice_op.cc:105 : INVALID_ARGUMENT: slice index 0 of dimension 0 out of bounds.
*** tensorflow.python.framework.errors_impl.InvalidArgumentError: {{function_node __wrapped__StridedSlice_device_/job:localhost/replica:0/task:0/device:CPU:0}} slice index 0 of dimension 0 out of bounds. [Op:StridedSlice] name: IOFromParquet/ParquetIODataset/strided_slice/
Here's a minimum working example to reproduce this error:
import pandas as pd
import numpy as np
import tensorflow_io as tfio
# Create parquet file
x = [np.asarray([i, i+1]) for i in range(5)]
df = pd.DataFrame({ "pred": x })
df.to_parquet("dfx.parquet")
# Create dataset
column_spec = {"pred": tf.TensorSpec(tf.TensorShape([]), tf.int64)}
ds = tfio.IODataset.from_parquet("dfx.parquet", columns=column_spec)