[Question] Modification for Performing Fine-Tuning
allanj opened this issue · comments
Allan Jie commented
If I'm trying to perform fine-tuning instead of language model training, for the following requirements:
- Given the input ids
$x$ , train the loss of the output sequence$y$ . Different from LM training, I don't enforce loss on the input tokens here.
I think I have to modify the dataloader.py
- The packing function by
group_texts
seems not applicable as I'm fine-tuning here. (but maybe this is minor concern) - How to disable the loss on the input tokens? I think I need a different collator rather than
DataCollatorForCLM
.
I'm not sure this should be all modifications? Or are there better suggestions that I do not have to revise the source code.
Seungjae Jung commented
I need this feature too and I believe we need to shard pretrained checkpoints into nanotron format.
Allan Jie commented
I need this feature too and I believe we need to shard pretrained checkpoints into nanotron format.
That seems a different problem. Am I understanding it correctly? are you talking about the model checkpoint format?
Allan Jie commented
Should I just modify the input mask?