jerryji1993 / DNABERT

DNABERT: pre-trained Bidirectional Encoder Representations from Transformers model for DNA-language in genome

Home Page:https://doi.org/10.1093/bioinformatics/btab083

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Long training time and overfitting when reproducing your example

warm-ice0x00 opened this issue · comments

Hello, I am trying to reproduce your results. I configured the environment as instructed in the README and ran your pre-training program with the pre-training dataset you provided with 3000 pieces of data. However, even if I reduced the number of training steps to one hundredth of your example, the estimated total training time is over 50 hours on 8 NVIDIA A100s. At the same time, because the number of training steps is reduced, although the training loss curve is in a downward trend, the jitter is serious, and the validation loss is much larger than the training loss, which seems to be overfitting. I am wondering how should I solve these problems to get the expected results? Thanks.

I meet the same problem. Can we discuss more about this paper?