bentoml / OpenLLM

Run any open-source LLMs, such as Llama 2, Mistral, as OpenAI compatible API endpoint in the cloud.

Home Page:https://bentoml.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

openllm_core.exceptions.OpenLLMException: Failed to initialise vLLMEngine due to the following error: Model architectures ['T5ForConditionalGeneration'] are not supported for now.

yasserkh2 opened this issue · comments

Describe the bug

When attempting to run google/flan-t5-large
openllm start google/flan-t5-large
it gave me
ValueError: Model architectures ['T5ForConditionalGeneration'] are not supported for now. Supported architectures: ['AquilaModel', 'AquilaForCausalLM', 'BaiChuanForCausalLM', 'BaichuanForCausalLM', 'BloomForCausalLM', 'ChatGLMModel', 'ChatGLMForConditionalGeneration', 'DeciLMForCausalLM', 'DeepseekForCausalLM', 'FalconForCausalLM', 'GPT2LMHeadModel', 'GPTBigCodeForCausalLM', 'GPTJForCausalLM', 'GPTNeoXForCausalLM', 'InternLMForCausalLM', 'InternLM2ForCausalLM', 'LlamaForCausalLM', 'LLaMAForCausalLM', 'MistralForCausalLM', 'MixtralForCausalLM', 'QuantMixtralForCausalLM', 'MptForCausalLM', 'MPTForCausalLM', 'OPTForCausalLM', 'PhiForCausalLM', 'QWenLMHeadModel', 'Qwen2ForCausalLM', 'RWForCausalLM', 'StableLMEpochForCausalLM']

To reproduce

1-pip install openllm
2-openllm -h
3-Run the command openllm start google/flan-t5-large

Logs

2024-02-20T13:07:12+0000 [ERROR] [runner:llm-flan-t5-runner:1] Traceback (most recent call last):
File "/home/ubuntu/python/yasser/n_venv/lib/python3.10/site-packages/openllm/_runners.py", line 131, in __init__
self.model = vllm.AsyncLLMEngine.from_engine_args(
File "/home/ubuntu/python/yasser/n_venv/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 625, in from_engine_args
engine = cls(parallel_config.worker_use_ray,
File "/home/ubuntu/python/yasser/n_venv/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 321, in __init__
self.engine = self._init_engine(*args, **kwargs)
File "/home/ubuntu/python/yasser/n_venv/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 366, in _init_engine
return engine_class(*args, **kwargs)
File "/home/ubuntu/python/yasser/n_venv/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 120, in __init__
self._init_workers()
File "/home/ubuntu/python/yasser/n_venv/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 164, in _init_workers
self._run_workers("load_model")
File "/home/ubuntu/python/yasser/n_venv/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 1006, in _run_workers
driver_worker_output = getattr(self.driver_worker,
File "/home/ubuntu/python/yasser/n_venv/lib/python3.10/site-packages/vllm/worker/worker.py", line 102, in load_model
self.model_runner.load_model()
File "/home/ubuntu/python/yasser/n_venv/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 84, in load_model
self.model = get_model(self.model_config, self.device_config,
File "/home/ubuntu/python/yasser/n_venv/lib/python3.10/site-packages/vllm/model_executor/model_loader.py", line 43, in get_model
model_class = _get_model_architecture(model_config)
File "/home/ubuntu/python/yasser/n_venv/lib/python3.10/site-packages/vllm/model_executor/model_loader.py", line 35, in _get_model_architecture
raise ValueError(
ValueError: Model architectures ['T5ForConditionalGeneration'] are not supported for now. Supported architectures: ['AquilaModel', 'AquilaForCausalLM', 'BaiChuanForCausalLM', 'BaichuanForCausalLM', 'BloomForCausalLM', 'ChatGLMModel', 'ChatGLMForConditionalGeneration', 'DeciLMForCausalLM', 'DeepseekForCausalLM', 'FalconForCausalLM', 'GPT2LMHeadModel', 'GPTBigCodeForCausalLM', 'GPTJForCausalLM', 'GPTNeoXForCausalLM', 'InternLMForCausalLM', 'InternLM2ForCausalLM', 'LlamaForCausalLM', 'LLaMAForCausalLM', 'MistralForCausalLM', 'MixtralForCausalLM', 'QuantMixtralForCausalLM', 'MptForCausalLM', 'MPTForCausalLM', 'OPTForCausalLM', 'PhiForCausalLM', 'QWenLMHeadModel', 'Qwen2ForCausalLM', 'RWForCausalLM', 'StableLMEpochForCausalLM']

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/home/ubuntu/python/yasser/n_venv/lib/python3.10/site-packages/starlette/routing.py", line 734, in lifespan
async with self.lifespan_context(app) as maybe_state:
File "/opt/conda/lib/python3.10/contextlib.py", line 199, in __aenter__
return await anext(self.gen)
File "/home/ubuntu/python/yasser/n_venv/lib/python3.10/site-packages/bentoml/_internal/server/base_app.py", line 75, in lifespan
on_startup()
File "/home/ubuntu/python/yasser/n_venv/lib/python3.10/site-packages/bentoml/_internal/runner/runner.py", line 317, in init_local
raise e
File "/home/ubuntu/python/yasser/n_venv/lib/python3.10/site-packages/bentoml/_internal/runner/runner.py", line 307, in init_local
self._set_handle(LocalRunnerRef)
File "/home/ubuntu/python/yasser/n_venv/lib/python3.10/site-packages/bentoml/_internal/runner/runner.py", line 150, in _set_handle
runner_handle = handle_class(self, *args, **kwargs)
File "/home/ubuntu/python/yasser/n_venv/lib/python3.10/site-packages/bentoml/_internal/runner/runner_handle/local.py", line 27, in __init__
self._runnable = runner.runnable_class(**runner.runnable_init_params)  # type: ignore
File "/home/ubuntu/python/yasser/n_venv/lib/python3.10/site-packages/openllm/_runners.py", line 148, in __init__
raise openllm.exceptions.OpenLLMException(f'Failed to initialise vLLMEngine due to the following error:\n{err}') from err
openllm_core.exceptions.OpenLLMException: Failed to initialise vLLMEngine due to the following error:
Model architectures ['T5ForConditionalGeneration'] are not supported for now. Supported architectures: ['AquilaModel', 'AquilaForCausalLM', 'BaiChuanForCausalLM', 'BaichuanForCausalLM', 'BloomForCausalLM', 'ChatGLMModel', 'ChatGLMForConditionalGeneration', 'DeciLMForCausalLM', 'DeepseekForCausalLM', 'FalconForCausalLM', 'GPT2LMHeadModel', 'GPTBigCodeForCausalLM', 'GPTJForCausalLM', 'GPTNeoXForCausalLM', 'InternLMForCausalLM', 'InternLM2ForCausalLM', 'LlamaForCausalLM', 'LLaMAForCausalLM', 'MistralForCausalLM', 'MixtralForCausalLM', 'QuantMixtralForCausalLM', 'MptForCausalLM', 'MPTForCausalLM', 'OPTForCausalLM', 'PhiForCausalLM', 'QWenLMHeadModel', 'Qwen2ForCausalLM', 'RWForCausalLM', 'StableLMEpochForCausalLM']

Environment


#### System information

`bentoml`: 1.1.11
`python`: 3.10.9
`platform`: Linux-5.15.0-1052-aws-x86_64-with-glibc2.31
`uid_gid`: 1000:1000
<details><summary><code>pip_packages</code></summary>

<br>

accelerate==0.27.2
aiohttp==3.9.3
aioprometheus==23.12.0
aiosignal==1.3.1
annotated-types==0.6.0
anyio==4.3.0
appdirs==1.4.4
asgiref==3.7.2
asttokens==2.4.1
async-timeout==4.0.3
attrs==23.2.0
bentoml==1.1.11
bitsandbytes==0.41.3.post2
build==0.10.0
cattrs==23.1.2
certifi==2024.2.2
charset-normalizer==3.3.2
circus==0.18.0
click==8.1.7
click-option-group==0.5.6
cloudpickle==3.0.0
coloredlogs==15.0.1
comm==0.2.1
contextlib2==21.6.0
cuda-python==12.3.0
cupy-cuda12x==12.1.0
datasets==2.17.1
debugpy==1.8.1
decorator==5.1.1
deepmerge==1.1.1
Deprecated==1.2.14
dill==0.3.8
distlib==0.3.8
distro==1.9.0
einops==0.7.0
exceptiongroup==1.2.0
executing==2.0.1
fastapi==0.109.2
fastcore==1.5.29
fastrlock==0.8.2
filelock==3.13.1
filetype==1.2.0
frozenlist==1.4.1
fs==2.4.16
fsspec==2023.10.0
ghapi==1.0.4
h11==0.14.0
httpcore==1.0.3
httptools==0.6.1
httpx==0.26.0
huggingface-hub==0.20.3
humanfriendly==10.0
idna==3.6
importlib-metadata==6.11.0
inflection==0.5.1
ipykernel==6.29.2
ipython==8.21.0
jedi==0.19.1
Jinja2==3.1.3
jsonschema==4.21.1
jsonschema-specifications==2023.12.1
jupyter_client==8.6.0
jupyter_core==5.7.1
markdown-it-py==3.0.0
MarkupSafe==2.1.5
matplotlib-inline==0.1.6
mdurl==0.1.2
mpmath==1.3.0
msgpack==1.0.7
multidict==6.0.5
multiprocess==0.70.16
mypy-extensions==1.0.0
nest-asyncio==1.6.0
networkx==3.2.1
ninja==1.11.1.1
numpy==1.26.4
nvidia-cublas-cu12==12.1.3.1
nvidia-cuda-cupti-cu12==12.1.105
nvidia-cuda-nvrtc-cu12==12.1.105
nvidia-cuda-runtime-cu12==12.1.105
nvidia-cudnn-cu12==8.9.2.26
nvidia-cufft-cu12==11.0.2.54
nvidia-curand-cu12==10.3.2.106
nvidia-cusolver-cu12==11.4.5.107
nvidia-cusparse-cu12==12.1.0.106
nvidia-ml-py==11.525.150
nvidia-nccl-cu12==2.18.1
nvidia-nvjitlink-cu12==12.3.101
nvidia-nvtx-cu12==12.1.105
openllm==0.4.44
openllm-client==0.4.44
openllm-core==0.4.44
opentelemetry-api==1.20.0
opentelemetry-instrumentation==0.41b0
opentelemetry-instrumentation-aiohttp-client==0.41b0
opentelemetry-instrumentation-asgi==0.41b0
opentelemetry-sdk==1.20.0
opentelemetry-semantic-conventions==0.41b0
opentelemetry-util-http==0.41b0
optimum==1.17.1
orjson==3.9.14
packaging==23.2
pandas==2.2.0
parso==0.8.3
pathspec==0.12.1
pexpect==4.9.0
pillow==10.2.0
pip-requirements-parser==32.0.1
pip-tools==7.3.0
platformdirs==4.2.0
prometheus_client==0.20.0
prompt-toolkit==3.0.43
protobuf==4.25.3
psutil==5.9.8
ptyprocess==0.7.0
pure-eval==0.2.2
pyarrow==15.0.0
pyarrow-hotfix==0.6
pydantic==2.6.1
pydantic_core==2.16.2
Pygments==2.17.2
pynvml==11.5.0
pyparsing==3.1.1
pyproject_hooks==1.0.0
python-dateutil==2.8.2
python-dotenv==1.0.1
python-json-logger==2.0.7
python-multipart==0.0.9
pytz==2024.1
PyYAML==6.0.1
pyzmq==25.1.2
quantile-python==1.1
ray==2.9.2
referencing==0.33.0
regex==2023.12.25
requests==2.31.0
rich==13.7.0
rpds-py==0.18.0
safetensors==0.4.2
schema==0.7.5
scipy==1.12.0
sentencepiece==0.2.0
simple-di==0.1.5
six==1.16.0
sniffio==1.3.0
stack-data==0.6.3
starlette==0.36.3
sympy==1.12
tokenizers==0.15.2
tomli==2.0.1
torch==2.1.2
tornado==6.4
tqdm==4.66.2
traitlets==5.14.1
transformers==4.37.2
triton==2.1.0
typing_extensions==4.9.0
tzdata==2024.1
urllib3==2.2.1
uvicorn==0.27.1
uvloop==0.19.0
virtualenv==20.25.0
vllm==0.3.1
watchfiles==0.21.0
wcwidth==0.2.13
websockets==12.0
wrapt==1.16.0
xformers==0.0.23.post1
xxhash==3.4.1
yarl==1.9.4
zipp==3.17.0


</details>
_______________________________________________________________________________________________________

- `transformers` version: 4.37.2
- Platform: Linux-5.15.0-1052-aws-x86_64-with-glibc2.31
- Python version: 3.10.9
- Huggingface_hub version: 0.20.3
- Safetensors version: 0.4.2
- Accelerate version: 0.27.2
- Accelerate config:    not found
- PyTorch version (GPU?): 2.1.2+cu121 (True)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed


### System information (Optional)

_No response_

Encoder-decoders models are not supported by vLLM.

Currently only decoder-only is supported.