itsnamgyu / reasoning-teacher

Official code for "Large Language Models Are Reasoning Teachers", ACL 2023

Home Page:https://arxiv.org/abs/2212.10071

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Is there a way to adapt to other models?

zhan0903 opened this issue · comments

The custom_train.py only supports "flan", "t5" and "gpt2", I wonder if is there a way to finetune other models and how to define the loss function. Thanks.

Absolutely! You can change the model definitions near line 50 in train.py. You can adapt the settings from T5 for encoder-decoder models and GPT2 for decoder-only models. Please note that each Huggingface model class works slightly differently, so you may need to dig into the code for data preprocessing, including tokenization. This is inevitable when working with Huggingface.

If you simply want to apply the same next-token prediction loss to different models, you don't need to worry about the loss function.