basveeling / pcam

The PatchCamelyon (PCam) deep learning classification benchmark.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

HDF5 in tensorflow 2.4

polaschwoebel opened this issue · comments

Hi all, thanks for the cool dataset!
I am trying to use it in tensorflow and have come across the following problem:
Newer versions of keras (the one shipped with tf 2.4) don't seem to include HDF5Matrix anymore, and when using older code I get a warning referring me to the new HDF5 functionality in tensorflow I/O:
https://www.tensorflow.org/io/api_docs/python/tfio/v0/IODataset#from_hdf5

However, trying to use
train_dataset = tfio.v0.IODataset.from_hdf5(xpath, '/camelyonpatch_level_2_split_train_x', tf.int64)
I get the following error:
tensorflow.python.framework.errors_impl.InvalidArgumentError: slice index 0 of dimension 0 out of bounds. [Op:StridedSlice] name: IOFromHDF5/HDF5IODataset/strided_slice/

Any ideas why this is? Could it be the file itself that doesn't come in the right shape?
Thanks!

Hi @polaschwoebel ,
I faced the same problem and was able to load the data using python's h5py library. The code goes like this:

x_filename = "camelyonpatch_level_2_split_train_x.h5"
y_filename = "camelyonpatch_level_2_split_train_y.h5"
h5X = h5py.File(x_filename, 'r')
h5y = h5py.File(y_filename, 'r')
X = np.array(h5X.get('x'))
y = np.array(h5y.get('y')).reshape([-1, 1])

I found this code from: https://github.com/alexmagsam/metastasis-detection/blob/master/data.py