Encountered an error while running the demo: the data type (dtype) of the llm does not match the data type of the vision model.
gaowei724 opened this issue · comments
gaowei commented
Thank you for your outstanding work and open-source code. I encountered an error while trying to run the demo following the instructions for "run application". The error message is as follows:
...
final_embedding[image_to_overwrite] = image_features.contiguous().reshape(-1, embed_dim).to(target_device)
RuntimeError: Index put requires the source and destination dtypes match, got Half for the destination and BFloat16 for the source.
...
After checking, I found that the error was caused by a dtype mismatch between the llm and the vision model. Specifically, I used the llava-hf/llava-v1.6-vicuna-7b-hf model you shared on Hugging Face and I ran the command bash scripts/demo.sh MODELS/llava-v1.6-vicuna-7b-hf/ MODELS/llava-v1.6-vicuna-7b-hf/
, but I got the above error. Could you advise how to adjust the settings to make the dtypes of the llm and vision model consistent? Thank you.