Long training time and overfitting when reproducing your example

Question

Long training time and overfitting when reproducing your example

warm-ice0x00 opened this issue 2 years ago · comments

Hello, I am trying to reproduce your results. I configured the environment as instructed in the README and ran your pre-training program with the pre-training dataset you provided with 3000 pieces of data. However, even if I reduced the number of training steps to one hundredth of your example, the estimated total training time is over 50 hours on 8 NVIDIA A100s. At the same time, because the number of training steps is reduced, although the training loss curve is in a downward trend, the jitter is serious, and the validation loss is much larger than the training loss, which seems to be overfitting. I am wondering how should I solve these problems to get the expected results? Thanks.

DwanZhang · Answer 1 · Wed Feb 22 2023 19:07:18 GMT+0800 (China Standard Time)

I meet the same problem. Can we discuss more about this paper?