- Install the requirements
pip install -e .
pip install flash-attn --no-build-isolation # optional, for faster inference
- for connecting to Wildvision Arena, you need to install bore
bash install_bore.sh
- some models may require additional dependencies, see the top of
setup.py
for details. To install extra dependencies for a specific model, you can run
pip install -e .[cogvlm2-video] # for cogvlm2-video
(Note: the extra dependencies for different models might conflict with each other, so you should better create a new virtual environment for each model.)
python -m lmm_engines.huggingface.model.dummy_image_model
python -m lmm_engines.huggingface.model.dummy_video_model
# python -m lmm_engines.huggingface.model.model_tinyllava # example
First run bash install_bore.sh
once to install bore.
bash start_worker_on_arena.sh ${model_name} ${model_port} ${num_gpu}
# Example
bash start_worker_on_arena.sh dummy_image_model 41411 1
Then your worker shall be registered to the arena. You can check it by visiting 🤗 WildVision/vision-arena
See ## Controbute a model
section for how to contribute your own model.
CUDA_VISIBLE_DEVICES=0 python -m lmm_engines.huggingface.model_worker --model-path dummy_image_model --port 31004 --worker http://127.0.0.1:31004 --host=127.0.0.1 --no-register
Then call the worker
from lmm_engines import get_call_worker_func
call_worker_func = get_call_worker_func(
worker_addrs=["http://127.0.0.1:31004"],
use_cache=False
)
test_messages = [
{
"role": "user",
"content": [
{
"type": "text",
"text": "What is unusual about this image?",
},
{
"type": "image_url",
"image_url": "https://llava.hliu.cc/file=/nobackup/haotian/tmp/gradio/ca10383cc943e99941ecffdc4d34c51afb2da472/extreme_ironing.jpg"
}
]
}
]
generation_kwargs = {
"temperature": 0.0,
"top_p": 1.0,
"max_new_tokens": 200,
}
call_worker_func(test_messages, **generation_kwargs)
Or you can start a new worker automatically, fusing the above two steps all in one. model worker will close automatically after the python script ends.
from lmm_engines import get_call_worker_func
# start a new worker
call_worker_func = get_call_worker_func(
model_name="dummy_image_model", #
engine="huggingface",
num_workers=1,
num_gpu_per_worker=1,
dtype="float16",
use_cache=False
)
test_messages = [
{
"role": "user",
"content": [
{
"type": "text",
"text": "What is unusual about this image?",
},
{
"type": "image_url",
"image_url": "https://llava.hliu.cc/file=/nobackup/haotian/tmp/gradio/ca10383cc943e99941ecffdc4d34c51afb2da472/extreme_ironing.jpg"
}
]
}
]
generation_kwargs = {
"temperature": 0.0,
"top_p": 1.0,
"max_new_tokens": 200,
}
# call the worker
print(call_worker_func(test_messages, **generation_kwargs))
- output cache
set
use_cache=True
to enable output cache. The cache will be stored in~/lmm_engines/generation_cache/{model_name}.jsonl
by default.
- If you are contributing a new image model, copy the lmm_engines/huggingface/model/dummy_image_model.py and modify it.
- If you are contributing a new video model, copy the lmm_engines/huggingface/model/dummy_video_model.py and modify it.
- Four functions to implement:
load_model(self, model_path: str, device: str, from_pretrained_kwargs: Dict[str, Any]) -> None
generate(self, messages: List[Dict[str, Any]], **kwargs) -> List[Dict[str, Any]]
generate_image(self, image: Image.Image, **kwargs) -> Image.Image
generate_video(self, video: List[Image.Image], **kwargs) -> List[Image.Image]
- test the model adapter: see lmm_engines/huggingface/README.md
- add registration at the bottom of
lmm_engines/huggingface/model/model_adapter.py
- Connect to Wildvision Arena and be one arena competitor:
bash start_worker_on_arena.sh ${model_name} ${model_port}
(Note: we don't care the internal details of these 4 functions, as long as it can receive params and return the expected results as specified in the function signature.)
More details to see lmm_engines/huggingface/README.md
- add support for model_tinyllava.py (Example implementation by dongfu)
- add support for model_llavanextvideo.py (Example implementation by dongfu)
- add support for model_llavanextvideoqwen.py (ontributed by dongfu, on 2024-07-28)
- add support for model_nvidia_api.py (contributed by jing gu, on 2024-07-28)
- add support for model_internvl2.py (contributed by chenhui, on 2024-08-05)
- add support for model_bunny.py
- add support for model_deepseekvl.py
- add support for model_idefics.py
- add support for model_instructblip.py
- add support for model_lita.py
- add support for model_llavanext.py
- add support for model_llava.py
- add support for model_qwenvl.py
- add support for model_uform.py
- add support for model_videollama2.py
- add support for model_videollava.py (contributed by dongfu, on 2024-07-28)
- add support for model_yivlplus.py
- add support for model_yivl.py
- add support for model_reka.py
- add support for model_llava_v1_5.py
- add support for model_llava_v1_6.py
- add support for model_minicpm.py
- add support for model_minicpmapi.py
- add support for model_llavaapi.py
- add support for model_cogvlm.py
- add support for model_qwenvlapi.py
- add support for model_openai.py
- add support for model_claude.py
- add support for model_gemini.py