bentoml / OpenLLM

Run any open-source LLMs, such as Llama 2, Mistral, as OpenAI compatible API endpoint in the cloud.

Home Page:https://bentoml.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

bug: Not enough data for satisfy transfer length header

Cherchercher opened this issue · comments

Describe the bug

environment: Python 10

usage:
openllm start NousResearch/llama-2-13b-chat-hf

llm = OpenLLMAPI(address="http://some_address:3000/")
llm.complete("What are some hazards crude oil stored in tank?")

error:
| aiohttp.client_exceptions.ClientPayloadError: Response payload is not completed: <TransferEncodingError: 400, message='Not enough data for satisfy transfer length header.'>
|
| The above exception was the direct cause of the following exception:
|
| Traceback (most recent call last):
| File "/home/ib.gaga/.local/lib/python3.10/site-packages/starlette/responses.py", line 261, in wrap
| await func()
| File "/home/ib.gaga/.local/lib/python3.10/site-packages/starlette/responses.py", line 250, in stream_response
| async for chunk in self.body_iterator:
| File "/home/ib.gaga/.local/lib/python3.10/site-packages/openllm/_service.py", line 28, in generate_stream_v1
| async for it in llm.generate_iterator(**llm_model_class(**input_dict).model_dump()):
| File "/home/ib.gaga/.local/lib/python3.10/site-packages/openllm/_llm.py", line 127, in generate_iterator
| raise RuntimeError(f'Exception caught during generation: {err}') from err
| RuntimeError: Exception caught during generation: Response payload is not completed: <TransferEncodingError: 400, message='Not enough data for satisfy transfer length header.'>

To reproduce

No response

Logs

No response

Environment

BENTOML_DEBUG=''
BENTOML_QUIET=''
BENTOML_BUNDLE_LOCAL_BUILD=''
BENTOML_DO_NOT_TRACK=''
BENTOML_CONFIG=''
BENTOML_CONFIG_OPTIONS=''
BENTOML_PORT=''
BENTOML_HOST=''
BENTOML_API_WORKERS=''

System information

bentoml: 1.1.11
python: 3.10.12
platform: Linux-6.5.0-1017-azure-x86_64-with-glibc2.35
uid_gid: 2206643:100

pip_packages
accelerate==0.29.2
aiohttp==3.9.5
aiosignal==1.3.1
annotated-types==0.6.0
anyio==4.3.0
appdirs==1.4.4
asgiref==3.8.1
async-timeout==4.0.3
attrs==23.2.0
auto_gptq==0.7.1
Automat==20.2.0
Babel==2.8.0
bcrypt==3.2.0
bentoml==1.1.11
bitsandbytes==0.41.3.post2
blinker==1.4
build==0.10.0
cattrs==23.1.2
certifi==2020.6.20
chardet==4.0.0
charset-normalizer==3.3.2
circus==0.18.0
click==8.1.7
click-option-group==0.5.6
cloud-init==23.4.4
cloudpickle==3.0.0
cmake==3.29.2
colorama==0.4.4
coloredlogs==15.0.1
command-not-found==0.3
configobj==5.0.6
constantly==15.1.0
contextlib2==21.6.0
cryptography==3.4.8
cuda-python==12.4.0
datasets==2.18.0
dbus-python==1.2.18
deepmerge==1.1.1
Deprecated==1.2.14
dill==0.3.8
diskcache==5.6.3
distlib==0.3.8
distro==1.7.0
distro-info==1.1+ubuntu0.2
einops==0.7.0
exceptiongroup==1.2.0
fail2ban==0.11.2
fastapi==0.110.1
fastcore==1.5.29
filelock==3.13.4
filetype==1.2.0
frozenlist==1.4.1
fs==2.4.16
fsspec==2024.2.0
gekko==1.1.1
ghapi==1.0.5
h11==0.14.0
httpcore==1.0.5
httplib2==0.20.2
httptools==0.6.1
httpx==0.27.0
huggingface-hub==0.22.2
humanfriendly==10.0
hyperlink==21.0.0
idna==3.3
importlib-metadata==6.11.0
incremental==21.3.0
inflection==0.5.1
interegular==0.3.3
jeepney==0.7.1
Jinja2==3.0.3
joblib==1.4.0
jsonpatch==1.32
jsonpointer==2.0
jsonschema==3.2.0
keyring==23.5.0
lark==1.1.9
launchpadlib==1.10.16
lazr.restfulclient==0.14.4
lazr.uri==1.0.6
lit==18.1.3
llvmlite==0.42.0
markdown-it-py==3.0.0
MarkupSafe==2.0.1
mdurl==0.1.2
more-itertools==8.10.0
mpmath==1.3.0
msgpack==1.0.8
multidict==6.0.5
multiprocess==0.70.16
mypy-extensions==1.0.0
nest-asyncio==1.6.0
netifaces==0.11.0
networkx==3.3
ninja==1.11.1.1
numba==0.59.1
numpy==1.26.4
nvidia-cublas-cu11==11.10.3.66
nvidia-cublas-cu12==12.1.3.1
nvidia-cuda-cupti-cu11==11.7.101
nvidia-cuda-cupti-cu12==12.1.105
nvidia-cuda-nvrtc-cu11==11.7.99
nvidia-cuda-nvrtc-cu12==12.1.105
nvidia-cuda-runtime-cu11==11.7.99
nvidia-cuda-runtime-cu12==12.1.105
nvidia-cudnn-cu11==8.5.0.96
nvidia-cudnn-cu12==8.9.2.26
nvidia-cufft-cu11==10.9.0.58
nvidia-cufft-cu12==11.0.2.54
nvidia-curand-cu11==10.2.10.91
nvidia-curand-cu12==10.3.2.106
nvidia-cusolver-cu11==11.4.0.1
nvidia-cusolver-cu12==11.4.5.107
nvidia-cusparse-cu11==11.7.4.91
nvidia-cusparse-cu12==12.1.0.106
nvidia-ml-py==11.525.150
nvidia-nccl-cu11==2.14.3
nvidia-nccl-cu12==2.18.1
nvidia-nvjitlink-cu12==12.4.127
nvidia-nvtx-cu11==11.7.91
nvidia-nvtx-cu12==12.1.105
oauthlib==3.2.0
openllm==0.4.44
openllm-client==0.4.44
openllm-core==0.4.44
opentelemetry-api==1.20.0
opentelemetry-instrumentation==0.41b0
opentelemetry-instrumentation-aiohttp-client==0.41b0
opentelemetry-instrumentation-asgi==0.41b0
opentelemetry-sdk==1.20.0
opentelemetry-semantic-conventions==0.41b0
opentelemetry-util-http==0.41b0
optimum==1.19.0
orjson==3.10.1
outlines==0.0.34
packaging==24.0
pandas==2.2.2
pathspec==0.12.1
peft==0.10.0
pexpect==4.8.0
pillow==10.3.0
pip-requirements-parser==32.0.1
pip-tools==7.3.0
platformdirs==4.2.0
prometheus_client==0.20.0
protobuf==5.26.1
psutil==5.9.8
ptyprocess==0.7.0
py-cpuinfo==9.0.0
pyarrow==15.0.2
pyarrow-hotfix==0.6
pyasn1==0.4.8
pyasn1-modules==0.2.1
pydantic==2.7.0
pydantic_core==2.18.1
Pygments==2.17.2
PyGObject==3.42.1
PyHamcrest==2.0.2
PyJWT==2.3.0
pynvml==11.5.0
pyOpenSSL==21.0.0
pyparsing==2.4.7
pyparted==3.11.7
pyproject_hooks==1.0.0
pyrsistent==0.18.1
pyserial==3.5
python-apt==2.4.0+ubuntu3
python-dateutil==2.9.0.post0
python-debian==0.1.43+ubuntu1.1
python-dotenv==1.0.1
python-json-logger==2.0.7
python-magic==0.4.24
python-multipart==0.0.9
pytz==2022.1
PyYAML==5.4.1
pyzmq==26.0.0
ray==2.10.0
referencing==0.34.0
regex==2024.4.16
requests==2.31.0
rich==13.7.1
rouge==1.0.1
rpds-py==0.18.0
safetensors==0.4.3
schema==0.7.5
scipy==1.13.0
SecretStorage==3.3.1
sentencepiece==0.2.0
service-identity==18.1.0
simple-di==0.1.5
six==1.16.0
sniffio==1.3.1
sos==4.5.6
ssh-import-id==5.11
starlette==0.37.2
sympy==1.12
systemd-python==234
tiktoken==0.6.0
tokenizers==0.15.2
tomli==2.0.1
torch==2.1.2
tornado==6.4
tqdm==4.66.2
transformers==4.39.3
triton==2.1.0
Twisted==22.1.0
typing_extensions==4.11.0
tzdata==2024.1
ubuntu-pro-client==8001
ufw==0.36.1
unattended-upgrades==0.1
urllib3==1.26.5
uvicorn==0.29.0
uvloop==0.19.0
virtualenv==20.25.2
vllm==0.4.0.post1
wadllib==1.3.6
WALinuxAgent==2.2.46
watchfiles==0.21.0
websockets==12.0
wrapt==1.16.0
xformers==0.0.23.post1
xxhash==3.4.1
yarl==1.9.4
zipp==1.0.0
zope.interface==5.4.0

transformers version: 4.39.3

  • Platform: Linux-6.5.0-1017-azure-x86_64-with-glibc2.35
  • Python version: 3.10.12
  • Huggingface_hub version: 0.22.2
  • Safetensors version: 0.4.3
  • Accelerate version: 0.29.2
  • Accelerate config: not found
  • PyTorch version (GPU?): 2.1.2+cu121 (False)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using GPU in script?:
  • Using distributed or parallel set-up in script?:

System information (Optional)

No response

Hello,
I get same error with the model microsoft--phi-2
Best regards
Sergej