tensorflow / similarity

TensorFlow Similarity is a python package focused on making similarity learning quick and easy.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[FEATURE REQUEST] use of dataset in tfsim.callbacks.EvalCallback

Lunatik00 opened this issue · comments

Hi, I have a relatively big dataset, considering the available ram, I currently have access to machines that I can use with the dataset, so that is not a problem for me, but since the ram use is a lot I checked if there was an implementation to use a dataset (tf.data.Dataset(), the same way it can be an input for the model.fit() function) and it wasn't, it could help people with less compute resources to use this function with their datasets (I read the dataset using the function tf.keras.utils.image_dataset_from_directory(), it can be batched or unbatched)

So we do provide the tfrecord sampler for handling datasets that are too large to fit in memory. There are some quirks to setting up the TFRecords, i.e., this sampler requires that each TF Record file contain contiguous blocks of classes where the size of each block is a multiple of example_per_class.

Regarding the EvalCallback. This was meant to hold a smaller subset of the data in memory as we need to rebuild the index every time we call the Callback. Since this is pretty expensive, the expectation is that this is small eval set.