Is there a way to adapt to other models?

Question

Is there a way to adapt to other models?

zhan0903 opened this issue a year ago · comments

The custom_train.py only supports "flan", "t5" and "gpt2", I wonder if is there a way to finetune other models and how to define the loss function. Thanks.

Namgyu Ho · Answer 1 · Thu Sep 07 2023 15:40:42 GMT+0800 (China Standard Time)

Absolutely! You can change the model definitions near line 50 in train.py. You can adapt the settings from T5 for encoder-decoder models and GPT2 for decoder-only models. Please note that each Huggingface model class works slightly differently, so you may need to dig into the code for data preprocessing, including tokenization. This is inevitable when working with Huggingface.

If you simply want to apply the same next-token prediction loss to different models, you don't need to worry about the loss function.