Does DAPT lead to forgetting over the original LM domain or overfitting over the target domain?
dr-GitHub-account opened this issue · comments
Further DAPT was implemented on each domain for 12.5K steps with unlabeled data from target domain only. I am wondering whether not adding unlabeled data from original LM domain leads to detrimental forgetting or overfitting.