LMM-Engines

Installation

Install the requirements

pip install -e .
pip install flash-attn --no-build-isolation # optional, for faster inference

for connecting to Wildvision Arena, you need to install bore

bash install_bore.sh

some models may require additional dependencies, see the top of setup.py for details. To install extra dependencies for a specific model, you can run

pip install -e .[cogvlm2-video] # for cogvlm2-video

(Note: the extra dependencies for different models might conflict with each other, so you should better create a new virtual environment for each model.)

Usage

Local testing

python -m lmm_engines.huggingface.model.dummy_image_model
python -m lmm_engines.huggingface.model.dummy_video_model
# python -m lmm_engines.huggingface.model.model_tinyllava # example

Connect to Wildvision Arena and be one arena competitor

First run bash install_bore.sh once to install bore.

bash start_worker_on_arena.sh ${model_name} ${model_port} ${num_gpu}
# Example
bash start_worker_on_arena.sh dummy_image_model 41411 1

Then your worker shall be registered to the arena. You can check it by visiting 🤗 WildVision/vision-arena

See ## Controbute a model section for how to contribute your own model.

Start a new worker for local inference

CUDA_VISIBLE_DEVICES=0 python -m lmm_engines.huggingface.model_worker --model-path dummy_image_model --port 31004 --worker http://127.0.0.1:31004 --host=127.0.0.1 --no-register

Then call the worker

from lmm_engines import get_call_worker_func
call_worker_func = get_call_worker_func(
    worker_addrs=["http://127.0.0.1:31004"],
    use_cache=False
)
test_messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": "What is unusual about this image?",
            },
            {
                "type": "image_url",
                "image_url": "https://llava.hliu.cc/file=/nobackup/haotian/tmp/gradio/ca10383cc943e99941ecffdc4d34c51afb2da472/extreme_ironing.jpg"
            }
        ]
    }
]
generation_kwargs = {
    "temperature": 0.0,
    "top_p": 1.0,
    "max_new_tokens": 200,
}
call_worker_func(test_messages, **generation_kwargs)

Or you can start a new worker automatically, fusing the above two steps all in one. model worker will close automatically after the python script ends.

from lmm_engines import get_call_worker_func
# start a new worker
call_worker_func = get_call_worker_func(
    model_name="dummy_image_model", # 
    engine="huggingface",
    num_workers=1,
    num_gpu_per_worker=1,
    dtype="float16",
    use_cache=False
)
test_messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": "What is unusual about this image?",
            },
            {
                "type": "image_url",
                "image_url": "https://llava.hliu.cc/file=/nobackup/haotian/tmp/gradio/ca10383cc943e99941ecffdc4d34c51afb2da472/extreme_ironing.jpg"
            }
        ]
    }
]
generation_kwargs = {
    "temperature": 0.0,
    "top_p": 1.0,
    "max_new_tokens": 200,
}
# call the worker
print(call_worker_func(test_messages, **generation_kwargs))

output cache set use_cache=True to enable output cache. The cache will be stored in ~/lmm_engines/generation_cache/{model_name}.jsonl by default.

Controbute a model

If you are contributing a new image model, copy the lmm_engines/huggingface/model/dummy_image_model.py and modify it.
If you are contributing a new video model, copy the lmm_engines/huggingface/model/dummy_video_model.py and modify it.
Four functions to implement:
- load_model(self, model_path: str, device: str, from_pretrained_kwargs: Dict[str, Any]) -> None
- generate(self, messages: List[Dict[str, Any]], **kwargs) -> List[Dict[str, Any]]
- generate_image(self, image: Image.Image, **kwargs) -> Image.Image
- generate_video(self, video: List[Image.Image], **kwargs) -> List[Image.Image]
test the model adapter: see lmm_engines/huggingface/README.md
add registration at the bottom of lmm_engines/huggingface/model/model_adapter.py
Connect to Wildvision Arena and be one arena competitor: bash start_worker_on_arena.sh ${model_name} ${model_port}

(Note: we don't care the internal details of these 4 functions, as long as it can receive params and return the expected results as specified in the function signature.)

More details to see lmm_engines/huggingface/README.md

WildVision-AI / LMM-Engines

LMM-Engines

Installation

Usage

Local testing

Connect to Wildvision Arena and be one arena competitor

Start a new worker for local inference

Controbute a model

TODO

Transfering models from old arena codes into lmm-engines

About

Languages