[Question] Modification for Performing Fine-Tuning

Question

allanj opened this issue 4 months ago · comments

If I'm trying to perform fine-tuning instead of language model training, for the following requirements:

Given the input ids $x$, train the loss of the output sequence $y$. Different from LM training, I don't enforce loss on the input tokens here.

I think I have to modify the dataloader.py

The packing function by group_texts seems not applicable as I'm fine-tuning here. (but maybe this is minor concern)
How to disable the loss on the input tokens? I think I need a different collator rather than DataCollatorForCLM.

I'm not sure this should be all modifications? Or are there better suggestions that I do not have to revise the source code.

Seungjae Jung · Answer 1 · Wed Feb 14 2024 08:36:37 GMT+0800 (China Standard Time)

I need this feature too and I believe we need to shard pretrained checkpoints into nanotron format.

Allan Jie · Answer 2 · Wed Feb 14 2024 12:05:07 GMT+0800 (China Standard Time)

I need this feature too and I believe we need to shard pretrained checkpoints into nanotron format.

That seems a different problem. Am I understanding it correctly? are you talking about the model checkpoint format?

Allan Jie · Answer 3 · Wed Mar 06 2024 15:39:11 GMT+0800 (China Standard Time)

Should I just modify the input mask?