Documentation: writing custom samplers compatible with multi GPU training

Question

Documentation: writing custom samplers compatible with multi GPU training

fteufel opened this issue a month ago · comments

📚 Documentation

Hi,

I'm trying to run distributed training with a custom sampler for the first time. The idea is rather simple (fixed budget for each class) and works fine in single GPU. When moving to multi GPU, unsurprisingly I get an error message, which tells me that I should subclass BatchSampler.

TypeError:  Lightning can't inject a (distributed) sampler into your batch sampler, because it doesn't subclass PyTorch's `BatchSampler`. To mitigate this, either follow the API of `BatchSampler` or set `Trainer(use_distributed_sampler=False)`. If you choose the latter, you will be responsible for handling the distributed sampling within your batch sampler.

It is my understanding that torch's BatchSampler takes one (single-sample) Sampler and samples from that repeatedly to fill up the batch size. Are there any guidelines for how samplers should be built to be compatible with the sampler injection? I can't seem to find it in the docs.

cc @Borda