uta-smile / TCL

code for TCL: Vision-Language Pre-Training with Triple Contrastive Learning, CVPR 2022

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Inference + Hugging Face

mattmdjaga opened this issue · comments

Is there an easy way to inference the model on some new examples? Also, are there any plans to put the model on Hugging face?

commented

Hi, thanks for your interest in our work.
What kind of inference do you mean? Image-text retrieval tasks?
In terms of putting the model on HuggingFace, I need to check with my team and let you know later. Thanks.

Inference for the generation task Visual question answering (VQA).

commented

I see. What't the difficulties of applying our current inference code on general VQA? VQA.py cannot be used in your tasks?

I tried to extract an inference code from the 'VQA.py' file but I found that you need to supply the model, question and answers. Whereas I thought that the model will generate the answers. So is there no way to inference on VQA without having pre-defined answers?

commented

The model will generate the answers using an answer decoder, isn't it? The only difference is that we constrain the answer decoder to only generate from the 3,192 candidate answers.