tensorflow / similarity

TensorFlow Similarity is a python package focused on making similarity learning quick and easy.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[REQUEST] TFDatasetMultiShotMemorySampler for custom datasets

Lunatik00 opened this issue · comments

hi, I am testing using different dataflows for training, I have tested using the sampler and the dataset (using tf.keras.utils.image_dataset_from_directory), I have found that loading the data and feeding it to the sampler ends up with a very different max batch size for the same gpu, 20 in one, over 30 in other, the dataset is the one that can have the most, but the data is not divided in a good way per batch, so, I want to try the dataset as the input for the memory sampler, but the current function is made to only download a non custom dataset, I will try modifications to make it work but I don't think my code will be generic and I haven't used overload functions before so I leave this as a request that should be simple to implement.

Hi Lunatik00, apologies for the slow response. We currently support loading custom data using the MultiShotMemorySampler. The data is loaded into memory and properly sampled over the classes to ensure that the batches are created correctly. However, some datasets can be too large to hold in memory, e.g., larger image datasets.

Fortunately, we just had a recent PR that adds support for loading examples from disk, see here. You'll need to pass the paths to your examples as the x input and then the load function will take that path and load the example from disk when constructing the batches.

Hopefully this helps, but let me know if you run into issues.