index drawn by AliasMethod is not on the same gpu as the model
dongyaoli10x opened this issue · comments
not sure if I missed something but it seems to me that if you train on multiple gpus with current implementation, the AliasMethod
puts the index on default gpu. The memory_l
and memory_ab
are on the correct gpu using the register_buffer
. Then the torch.index_select(self.memory_l, 0, idx.view(-1)).detach()
would gives arguments are located on different GPUs
error.
ok now I figured out why. In the current implementation, only encoder is put into DataParallel
. Contrast is not in DataParallel
. So the loss computation happens only in one GPU. This renders the register_buffer
of the memory bank useless. If put contrast into DataParallel
, it won't put AliasMethod
in the correct gpu. Probably the right way to go is DDP like you implemented in PyContrast