Why the performance on single gpu is relatively better than multi gpus?
zeyu-liu opened this issue · comments
As the results listed on the table, single gpu training is almost better than multi gpus (with same settings of re-rank/data). So, is this caused by BN? (smaller batchsize may results in more variance)
i think BN is one reason , another reason is pytorch's own problem, its still not perfect enough, some engineering optimizations are not good enough