zjunlp / EasyEdit

[知识编辑] [ACL 2024] An Easy-to-use Knowledge Editing Framework for LLMs.

Home Page:https://zjunlp.github.io/project/KnowEdit

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Could there be a bug in the FT implementation

drd13 opened this issue · comments

I've found what I think might be a bug in the implementation of the fine-tuning baseline. If this is indeed the case, this bug would yield incorrect results when the unlearning target is longer than one token.

Using the VSCode debugger, I found that the code in ft_main.py doesn't carry-out backpropagation properly. The current version of the code passes the prompts without the targets to the model by calling model(**inputs). It then gathers the logits of all tokens in the target from the last tokens logits. This will maximise the probability of all tokens in the target immediately succedding the prompt. This is not the correct behaviour which should maximise the probability of the first token in the target being a continuation to the input and then maximising the probability of the second token in the target being a continuation to the first token in the target...

I think this issue might be in the ROME repository, where the original code came from, where I've written an issue but they haven't responded. Thanks for any assistance you may offer.

Thanks for your advice, we will modify the training paradigm of FT-L as soon as possible

I have updated the optimization target of FT, you can refer to the latest version of the code

Hello @pengzju. I'll mark the issue as closed (but I haven't double-checked the code). If you ever rerun the experiments in the survey paper with the fix, I would be interested in knowing how it changes relative performance of fine-tuning.

Thank you very much for your rapid response and bug fix.

Thank you for your suggestion.
In my actual testing. Even if the optimization objective is changed, FT-L still cannot take into account both Reliability and Locality, which means that high reliability means that the weights of the model are completely damaged. High locality cannot guarantee a high editing success rate, which is still consistent with the results in our paper
😊

Thank you for your suggestion, we have provided two implementations (objective_optimization in FT-L):

    1. prompt_last: the method of ROME's (https://arxiv.org/abs/2202.05262) original paper, which calculates nll loss through the last token of the input.
    1. target_new: the standard autoregressive method, using the cross-entropy loss function

You can choose the appropriate optimization goal based on your experiment settings. Welcome you to try. 😊

commented

I've found what I think might be a bug in the implementation of the fine-tuning baseline. If this is indeed the case, this bug would yield incorrect results when the unlearning target is longer than one token.

Using the VSCode debugger, I found that the code in ft_main.py doesn't carry-out backpropagation properly. The current version of the code passes the prompts without the targets to the model by calling model(**inputs). It then gathers the logits of all tokens in the target from the last tokens logits. This will maximise the probability of all tokens in the target immediately succedding the prompt. This is not the correct behaviour which should maximise the probability of the first token in the target being a continuation to the input and then maximising the probability of the second token in the target being a continuation to the first token in the target...

I think this issue might be in the ROME repository, where the original code came from, where I've written an issue but they haven't responded. Thanks for any assistance you may offer.

Thank you very much for raising the issue. We actually encountered this problem in the early experiments last year, but to maintain consistency with previous work ROME, we didn't address it at the time. As @pengzju mentioned, our current approach involves splitting FT into two strategies.

  • prompt_last: the method of ROME's (https://arxiv.org/abs/2202.05262) original paper, which calculates nll loss through the last token of the input.

  • target_new: the standard autoregressive method, using the cross-entropy loss function. To differentiate and make it comparable, we're referring to it as FT-M which can achieve much better performance than the FT-L.

We'll be planning to update the survey paper on Arxiv with new experimental results soon, and we've already noted this issue in the readme. Feeling like in the future, everyone can use this FT-M technique as a strong knowledge editing baseline.

Best,

EasyEdit Team