kohjingyu / fromage

🧀 Code and models for the ICML 2023 paper "Grounding Language Models to Images for Multimodal Inputs and Outputs".

Home Page:https://jykoh.com/fromage

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Replacing OPT LLM with other LLMs

oferidan1 opened this issue · comments

Hi,
Thanks alot for your great work.
I am evaluating replacing the OPT LLM with other LLMs such as Mistral-7B-v0.1 7B or Phi-3-mini-4k-instruct.
I had to make minor code modifications to support these models- mainly adding [PAD] token to their tokenizers.
However, the training is not stable (many nan in training loss) and accurarcy results are much worst than original OPT 6.7B model.
Do you have any suggestion on why this happens? and if so, how can it be fixed?
Thanks in advanced,
Ofer