How to use the GoT frame through local LLM?

Question

How to use the GoT frame through local LLM?

dszpr opened this issue 6 months ago · comments

Hi! Much appreciated for the excellent work!
I would like to use the GoT frame to do vision-QA task. I have already prepared the FLAN-T5 LLM model. I wonder if it's possible to employ the GoT in my model to do inference?
And does GoT frame support multimodel LM, such as BLIP2?

Nils Blach · Answer 1 · Tue Jan 23 2024 17:43:56 GMT+0800 (China Standard Time)

Hi @dszpr,

Thanks for your kind words and for taking an interest in our project!

In theory, you can use any model with GoT (however the interface is currently restricted to inputs represented as strings).
We show how you can use models hosted on HuggingFace (HF) at the example of Llama2: https://github.com/spcl/graph-of-thoughts/blob/f508aef4fdfa94adc505f5a7745e6088bdf7889c/graph_of_thoughts/language_models/llamachat_hf.py
If you want to use FLAN-T5, you should only need to change the hf_model_id in the configurations as it uses AutoTokenizer and AutoModel. However, we currently always quantise, which you would need to update if this is not supported/wanted.
Using multimodal LMs might be a little trickier than changing a single parameter, but certainly possible.
The framework currently uses the following abstraction for language models: https://github.com/spcl/graph-of-thoughts/blob/main/graph_of_thoughts/language_models/abstract_language_model.py, which works with queries formatted as strings. You would have to potentially introduce a separate interface that allows for other modalities and then build on BLIP2 or do it a bit hacky by ignoring the typing and implementing the abstraction to first unpack your object that contains the input data.

Hope that helps!