WildVision-AI / LMM-Engines

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

LMM-Engines

Installation

  • Install the requirements
pip install -e .
pip install flash-attn --no-build-isolation # optional, for faster inference
  • for connecting to Wildvision Arena, you need to install bore
bash install_bore.sh
  • some models may require additional dependencies, see the top of setup.py for details. To install extra dependencies for a specific model, you can run
pip install -e .[cogvlm2-video] # for cogvlm2-video

(Note: the extra dependencies for different models might conflict with each other, so you should better create a new virtual environment for each model.)

Usage

Local testing

python -m lmm_engines.huggingface.model.dummy_image_model
python -m lmm_engines.huggingface.model.dummy_video_model
# python -m lmm_engines.huggingface.model.model_tinyllava # example

Connect to Wildvision Arena and be one arena competitor

First run bash install_bore.sh once to install bore.

bash start_worker_on_arena.sh ${model_name} ${model_port} ${num_gpu}
# Example
bash start_worker_on_arena.sh dummy_image_model 41411 1

Then your worker shall be registered to the arena. You can check it by visiting 🤗 WildVision/vision-arena

See ## Controbute a model section for how to contribute your own model.

Start a new worker for local inference

CUDA_VISIBLE_DEVICES=0 python -m lmm_engines.huggingface.model_worker --model-path dummy_image_model --port 31004 --worker http://127.0.0.1:31004 --host=127.0.0.1 --no-register

Then call the worker

from lmm_engines import get_call_worker_func
call_worker_func = get_call_worker_func(
    worker_addrs=["http://127.0.0.1:31004"],
    use_cache=False
)
test_messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": "What is unusual about this image?",
            },
            {
                "type": "image_url",
                "image_url": "https://llava.hliu.cc/file=/nobackup/haotian/tmp/gradio/ca10383cc943e99941ecffdc4d34c51afb2da472/extreme_ironing.jpg"
            }
        ]
    }
]
generation_kwargs = {
    "temperature": 0.0,
    "top_p": 1.0,
    "max_new_tokens": 200,
}
call_worker_func(test_messages, **generation_kwargs)

Or you can start a new worker automatically, fusing the above two steps all in one. model worker will close automatically after the python script ends.

from lmm_engines import get_call_worker_func
# start a new worker
call_worker_func = get_call_worker_func(
    model_name="dummy_image_model", # 
    engine="huggingface",
    num_workers=1,
    num_gpu_per_worker=1,
    dtype="float16",
    use_cache=False
)
test_messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": "What is unusual about this image?",
            },
            {
                "type": "image_url",
                "image_url": "https://llava.hliu.cc/file=/nobackup/haotian/tmp/gradio/ca10383cc943e99941ecffdc4d34c51afb2da472/extreme_ironing.jpg"
            }
        ]
    }
]
generation_kwargs = {
    "temperature": 0.0,
    "top_p": 1.0,
    "max_new_tokens": 200,
}
# call the worker
print(call_worker_func(test_messages, **generation_kwargs))
  • output cache set use_cache=True to enable output cache. The cache will be stored in ~/lmm_engines/generation_cache/{model_name}.jsonl by default.

Controbute a model

(Note: we don't care the internal details of these 4 functions, as long as it can receive params and return the expected results as specified in the function signature.)

More details to see lmm_engines/huggingface/README.md

TODO

Transfering models from old arena codes into lmm-engines

About

License:Apache License 2.0


Languages

Language:Python 99.8%Language:Shell 0.2%