mosaicml / llm-foundry

LLM training code for Databricks foundation models

Home Page:https://www.databricks.com/blog/introducing-dbrx-new-state-art-open-llm

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Converting a composer seq2seq t5 model throws an exception

timsteuer opened this issue · comments

Environment

  • llm-foundry: latest

To reproduce

Steps to reproduce the behavior:

  1. train a hf_t5 model
  2. download the composer checkpoint
  3. try to convert it back to huggingface via scripts/inference/convert_composer_to_hf.py
  4. The script crashes when trying to load the saved model as AutoModelForCausalLM

Expected behavior

The model is saved as a HuggingFace snapshot without any issue

Additional context

Locally, I fixed this via simply loading with AutoModel and not via AutoModelForCausalLM.
I guess this is fine.

Ah yes, that script only support causal lms right now. A note on your solution, I'm not certain, but AutoModel here may give you a T5Model rather than a T5ForConditionalGeneration as you may want. Probably worth double checking that.

That was an interesting hint.

Just double checked and the model was indeed marked as a T5Model and not as a T5ForConditionalGeneration.

So I changed that in the conversion script, such that it yields the right config. However, loading the final model via AutoModel still results in a T5Model even though the config now explicitly states the correct model type.

On the other hand, if I load via AutoModelForSeq2SeqLM it loads the lm_head. So, I guess that is a HF specific thing and not related to the conversion script per se.

Yeah, AutoModel generally gives you the backbone model, while the AutoModelForXYZ will give you the model with adaptation/head for XYZ.