bentoml / OpenLLM

Run any open-source LLMs, such as Llama 2, Mistral, as OpenAI compatible API endpoint in the cloud.

Home Page:https://bentoml.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

bug: AttributeError in Phi-2 Model Initialization Using BentoML OpenLLM Docker Container

hannody opened this issue · comments

Describe the bug

Hello OpenLLM Team,

I recently encountered an issue while attempting to run a Docker container with the BentoML OpenLLM package to host the Microsoft Phi-2 language model. During the initialization phase, the process fails with an AttributeError related to the 'PhiConfig' object.

Environment:

Docker Version: 25.0.1, build 29cf629
BentoML Version: bentoml, version 1.1.11
OpenLLM Version: openllm, 0.4.41 (compiled: False) - Python (CPython) 3.10.12
Operating System: Ubuntu 22.04 LTS

To reproduce

docker run --rm --gpus all -p 3000:3000 -it ghcr.io/bentoml/openllm start microsoft/phi-2

Logs

docker run --rm --gpus all -p 3000:3000 -it ghcr.io/bentoml/openllm start microsoft/phi-2
It is recommended to specify the backend explicitly. Cascading backend might lead to unexpected behaviour.
config.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 866/866 [00:00<00:00, 1.59MB/s]
tokenizer_config.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7.34k/7.34k [00:00<00:00, 9.79MB/s]
vocab.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 798k/798k [00:01<00:00, 769kB/s]
merges.txt: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 456k/456k [00:00<00:00, 634kB/s]
tokenizer.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2.11M/2.11M [00:01<00:00, 1.64MB/s]
added_tokens.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.08k/1.08k [00:00<00:00, 2.21MB/s]
special_tokens_map.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 99.0/99.0 [00:00<00:00, 231kB/s]
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
model.safetensors.index.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 35.7k/35.7k [00:00<00:00, 31.4MB/s]
configuration_phi.py: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9.26k/9.26k [00:00<00:00, 13.3MB/s]
LICENSE: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.08k/1.08k [00:00<00:00, 2.32MB/s]
generation_config.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 74.0/74.0 [00:00<00:00, 165kB/s]
modeling_phi.py: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 62.7k/62.7k [00:00<00:00, 240kB/s]
model-00002-of-00002.safetensors: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 564M/564M [01:04<00:00, 8.71MB/s]
model-00001-of-00002.safetensors: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5.00G/5.00G [04:02<00:00, 20.6MB/s]
Fetching 14 files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 14/14 [04:03<00:00, 17.36s/it]
🚀Tip: run 'openllm build microsoft/phi-2 --backend vllm --serialization safetensors' to create a BentoLLM for 'microsoft/phi-2'█████████████████████████████| 564M/564M [01:04<00:00, 11.0MB/s]
2024-01-26T07:54:55+0000 [INFO] [cli] Prometheus metrics for HTTP BentoServer from "_service:svc" can be accessed at http://localhost:3000/metrics.
2024-01-26T07:54:55+0000 [INFO] [cli] Starting production HTTP BentoServer from "_service:svc" listening on http://0.0.0.0:3000 (Press CTRL+C to quit)
INFO 01-26 07:55:00 llm_engine.py:70] Initializing an LLM engine with config: model='/root/bentoml/models/vllm-microsoft--phi-2/85d00b03fee509307549d823fdd095473ba5197c', tokenizer='/root/bentoml/models/vllm-microsoft--phi-2/85d00b03fee509307549d823fdd095473ba5197c', tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=2048, download_dir=None, load_format=auto, tensor_parallel_size=1, quantization=None, enforce_eager=False, seed=0)
Traceback (most recent call last):
File "/openllm-python/src/openllm/_runners.py", line 131, in __init__
self.model = vllm.AsyncLLMEngine.from_engine_args(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 500, in from_engine_args
engine = cls(parallel_config.worker_use_ray,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 273, in __init__
self.engine = self._init_engine(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 318, in _init_engine
return engine_class(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/vllm/engine/llm_engine.py", line 111, in __init__
self._init_workers()
File "/usr/local/lib/python3.11/site-packages/vllm/engine/llm_engine.py", line 146, in _init_workers
self._run_workers("load_model")
File "/usr/local/lib/python3.11/site-packages/vllm/engine/llm_engine.py", line 795, in _run_workers
driver_worker_output = getattr(self.driver_worker,
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/vllm/worker/worker.py", line 81, in load_model
self.model_runner.load_model()
File "/usr/local/lib/python3.11/site-packages/vllm/worker/model_runner.py", line 64, in load_model
self.model = get_model(self.model_config)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/vllm/model_executor/model_loader.py", line 65, in get_model
model = model_class(model_config.hf_config, linear_method)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/vllm/model_executor/models/phi_1_5.py", line 263, in __init__
self.transformer = PhiModel(config, linear_method)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/vllm/model_executor/models/phi_1_5.py", line 219, in __init__
self.h = nn.ModuleList([
^
File "/usr/local/lib/python3.11/site-packages/vllm/model_executor/models/phi_1_5.py", line 220, in <listcomp>
PhiLayer(config, linear_method)
File "/usr/local/lib/python3.11/site-packages/vllm/model_executor/models/phi_1_5.py", line 186, in __init__
eps=config.layer_norm_epsilon)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/transformers/configuration_utils.py", line 265, in __getattribute__
return super().__getattribute__(key)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'PhiConfig' object has no attribute 'layer_norm_epsilon'
INFO 01-26 07:55:07 llm_engine.py:70] Initializing an LLM engine with config: model='/root/bentoml/models/vllm-microsoft--phi-2/85d00b03fee509307549d823fdd095473ba5197c', tokenizer='/root/bentoml/models/vllm-microsoft--phi-2/85d00b03fee509307549d823fdd095473ba5197c', tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=2048, download_dir=None, load_format=auto, tensor_parallel_size=1, quantization=None, enforce_eager=False, seed=0)
Traceback (most recent call last):
File "/openllm-python/src/openllm/_runners.py", line 131, in __init__
self.model = vllm.AsyncLLMEngine.from_engine_args(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 500, in from_engine_args
engine = cls(parallel_config.worker_use_ray,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 273, in __init__
self.engine = self._init_engine(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 318, in _init_engine
return engine_class(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/vllm/engine/llm_engine.py", line 111, in __init__
self._init_workers()
File "/usr/local/lib/python3.11/site-packages/vllm/engine/llm_engine.py", line 146, in _init_workers
self._run_workers("load_model")
File "/usr/local/lib/python3.11/site-packages/vllm/engine/llm_engine.py", line 795, in _run_workers
driver_worker_output = getattr(self.driver_worker,
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/vllm/worker/worker.py", line 81, in load_model
self.model_runner.load_model()
File "/usr/local/lib/python3.11/site-packages/vllm/worker/model_runner.py", line 64, in load_model
self.model = get_model(self.model_config)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/vllm/model_executor/model_loader.py", line 65, in get_model
model = model_class(model_config.hf_config, linear_method)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/vllm/model_executor/models/phi_1_5.py", line 263, in __init__
self.transformer = PhiModel(config, linear_method)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/vllm/model_executor/models/phi_1_5.py", line 219, in __init__
self.h = nn.ModuleList([
^
File "/usr/local/lib/python3.11/site-packages/vllm/model_executor/models/phi_1_5.py", line 220, in <listcomp>
PhiLayer(config, linear_method)
File "/usr/local/lib/python3.11/site-packages/vllm/model_executor/models/phi_1_5.py", line 186, in __init__
eps=config.layer_norm_epsilon)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/transformers/configuration_utils.py", line 265, in __getattribute__
return super().__getattribute__(key)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'PhiConfig' object has no attribute 'layer_norm_epsilon'

Environment

Docker Version: 25.0.1, build 29cf629

BentoML Version: bentoml, version 1.1.11

OpenLLM Version: openllm, 0.4.41 (compiled: False)

Python (CPython) 3.10.12

Operating System: Ubuntu 22.04 LTS

System information (Optional)

No response

Hi, @larme , could you plz check this problem?

i encounter the same problem

me too, any update?

same issue