ZhikangNiu / encodec-pytorch

The effect of multi-gpus training is not as good as that of single-card training, and it feels that multi-card training is quickly overfitted.

Maybe every gpu's codebook weight different?
you can try this code?

Line 157 in c6b6de9

    
           # distrib.broadcast_tensors(self.buffers()) # FIXME: this is not working for some reason

traing convergence