部分数据集loss为0

Question

部分数据集loss为0

wang-ironman opened this issue 4 months ago · comments

请问为什么个别数据集的pair loss值为0呢，开源数据wiki_atomic_edits(from https://huggingface.co/datasets/shibing624/nli-zh-all），还有自建数据集部分数据也是，这可能是什么原因呢？数据质量问题？

Binh Nguyen · Answer 1 · Sun Aug 18 2024 17:25:00 GMT+0800 (China Standard Time)

I am encountering a similar issue where I suspected that Elastic Weight Consolidation (EWC) was the cause. However, after disabling EWC in the compute_loss function, the issue persists. Specifically, the problem arises during the second step, where the model’s output consists entirely of NaN values

sandan000 · Answer 2 · Tue Aug 20 2024 11:46:53 GMT+0800 (China Standard Time)

请问大佬解决了吗 @binhna @wang-ironman

Binh Nguyen · Answer 3 · Tue Aug 20 2024 13:33:13 GMT+0800 (China Standard Time)

@sandan000 The issue arises when loading the pre-trained model in float16. It’s also important to be cautious when setting the ewc_ratio, as a high value can result in a large loss. I adjusted it to 0.01 for my dataset, which resolved the problem.

model = MODEL_NAME_INFO[model_name][0].from_pretrained(
        model_dir,
        trust_remote_code=True,
        # torch_dtype=torch.float16  ### I have to commented this out
    )

sandan000 · Answer 4 · Tue Aug 20 2024 13:45:42 GMT+0800 (China Standard Time)

I've tried this before, but it didn't work @binhna