bug: Requests with "use_beam_search: true" fail with an unclear exception message.
yan-virin opened this issue · comments
yan-virin commented
Describe the bug
I am using this llm config in the json request:
"llm_config": {
"num_beams": 5,
"use_beam_search": true
}
and I am getting an unclear exception:
chatx-gdch-openllm-86d68dd84f-r8png RuntimeError: Exception caught during generation: Response payload is not completed
To reproduce
Use the following json for an http request:
{
"prompt": "...........",
"llm_config": {
"num_beams": 5,
"use_beam_search": true
}
}
Logs
chatx-gdch-openllm-86d68dd84f-r8png Traceback (most recent call last):
chatx-gdch-openllm-86d68dd84f-r8png File "/usr/local/lib/python3.11/dist-packages/bentoml/_internal/server/http_app.py", line 341, in api_func │
chatx-gdch-openllm-86d68dd84f-r8png output = await api.func(*args)
chatx-gdch-openllm-86d68dd84f-r8png ^^^^^^^^^^^^^^^^^^^^^
chatx-gdch-openllm-86d68dd84f-r8png File "/home/bentoml/bento/src/generated_llama_service.py", line 23, in generate_v1
chatx-gdch-openllm-86d68dd84f-r8png return (await llm.generate(**llm_model_class(**input_dict).model_dump())).model_dump()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
chatx-gdch-openllm-86d68dd84f-r8png File "/usr/local/lib/python3.11/dist-packages/openllm/_llm.py", line 55, in generate
chatx-gdch-openllm-86d68dd84f-r8png async for result in self.generate_iterator(
chatx-gdch-openllm-86d68dd84f-r8png File "/usr/local/lib/python3.11/dist-packages/openllm/_llm.py", line 125, in generate_iterator
chatx-gdch-openllm-86d68dd84f-r8png raise RuntimeError(f'Exception caught during generation: {err}') from err
Environment
Environment variable
BENTOML_DEBUG=''
BENTOML_QUIET=''
BENTOML_BUNDLE_LOCAL_BUILD=''
BENTOML_DO_NOT_TRACK=''
BENTOML_CONFIG=''
BENTOML_CONFIG_OPTIONS=''
BENTOML_PORT=''
BENTOML_HOST=''
BENTOML_API_WORKERS=''
System information
bentoml
: 1.1.11
python
: 3.10.13
platform
: Linux-5.10.0-27-cloud-amd64-x86_64-with-glibc2.31
uid_gid
: 1001:1002
conda
: 23.11.0
in_conda_env
: True
conda_packages
name: base
channels:
- file:///tmp/conda-pkgs
- conda-forge
- defaults
dependencies:
- _libgcc_mutex=0.1=conda_forge
- _openmp_mutex=4.5=2_gnu
- archspec=0.2.1=pyhd8ed1ab_1
- argon2-cffi=23.1.0=pyhd8ed1ab_0
- argon2-cffi-bindings=21.2.0=py310h2372a71_4
- arrow=1.3.0=pyhd8ed1ab_0
- asttokens=2.4.1=pyhd8ed1ab_0
- async-lru=2.0.4=pyhd8ed1ab_0
- attrs=23.1.0=pyh71513ae_1
- babel=2.13.1=pyhd8ed1ab_0
- backports=1.0=pyhd8ed1ab_3
- backports.functools_lru_cache=1.6.5=pyhd8ed1ab_0
- beautifulsoup4=4.12.2=pyha770c72_0
- bleach=6.1.0=pyhd8ed1ab_0
- boltons=23.0.0=pyhd8ed1ab_0
- brotli-python=1.1.0=py310hc6cd4ac_1
- bzip2=1.0.8=h7f98852_4
- c-ares=1.23.0=hd590300_0
- ca-certificates=2023.11.17=hbcca054_0
- cached-property=1.5.2=hd8ed1ab_1
- cached_property=1.5.2=pyha770c72_1
- certifi=2023.11.17=pyhd8ed1ab_0
- cffi=1.16.0=py310h2fee648_0
- charset-normalizer=3.3.2=pyhd8ed1ab_0
- colorama=0.4.6=pyhd8ed1ab_0
- comm=0.1.4=pyhd8ed1ab_0
- conda=23.11.0=py310hff52083_1
- conda-libmamba-solver=23.11.1=pyhd8ed1ab_0
- conda-package-handling=2.2.0=pyh38be061_0
- conda-package-streaming=0.9.0=pyhd8ed1ab_0
- cryptography=41.0.5=py310h75e40e8_0
- cudatoolkit=11.8.0=h4ba93d1_12
- debugpy=1.8.0=py310hc6cd4ac_1
- decorator=5.1.1=pyhd8ed1ab_0
- defusedxml=0.7.1=pyhd8ed1ab_0
- distro=1.8.0=pyhd8ed1ab_0
- dlenv-base=1.0.20231106=py310_0
- entrypoints=0.4=pyhd8ed1ab_0
- exceptiongroup=1.1.3=pyhd8ed1ab_0
- executing=2.0.1=pyhd8ed1ab_0
- faiss=1.7.4=py310cuda112hae2f2aa_0_cuda
- faiss-gpu=1.7.4=h788eb59_0
- fmt=9.1.0=h924138e_0
- fqdn=1.5.1=pyhd8ed1ab_0
- icu=73.2=h59595ed_0
- idna=3.4=pyhd8ed1ab_0
- importlib-metadata=6.8.0=pyha770c72_0
- importlib_metadata=6.8.0=hd8ed1ab_0
- importlib_resources=6.1.0=pyhd8ed1ab_0
- ipykernel=6.26.0=pyhf8b6a83_0
- ipython=8.17.2=pyh41d4057_0
- isoduration=20.11.0=pyhd8ed1ab_0
- jedi=0.19.1=pyhd8ed1ab_0
- jinja2=3.1.2=pyhd8ed1ab_1
- json5=0.9.14=pyhd8ed1ab_0
- jsonpatch=1.33=pyhd8ed1ab_0
- jsonpointer=2.4=py310hff52083_3
- jsonschema=4.19.2=pyhd8ed1ab_0
- jsonschema-specifications=2023.7.1=pyhd8ed1ab_0
- jsonschema-with-format-nongpl=4.19.2=pyhd8ed1ab_0
- jupyter-lsp=2.2.0=pyhd8ed1ab_0
- jupyter_client=8.5.0=pyhd8ed1ab_0
- jupyter_core=5.5.0=py310hff52083_0
- jupyter_events=0.8.0=pyhd8ed1ab_0
- jupyter_server=2.9.1=pyhd8ed1ab_0
- jupyter_server_terminals=0.4.4=pyhd8ed1ab_1
- jupyterlab_pygments=0.2.2=pyhd8ed1ab_0
- jupyterlab_server=2.25.0=pyhd8ed1ab_0
- keyutils=1.6.1=h166bdaf_0
- krb5=1.20.1=h81ceb04_0
- ld_impl_linux-64=2.40=h41732ed_0
- libarchive=3.6.2=h039dbb9_1
- libblas=3.9.0=20_linux64_openblas
- libcblas=3.9.0=20_linux64_openblas
- libcurl=8.4.0=h251f7ec_1
- libedit=3.1.20191231=he28a2e2_2
- libev=4.33=h516909a_1
- libfaiss=1.7.4=cuda112hb18a002_0_cuda
- libfaiss-avx2=1.7.4=cuda112h1234567_0_cuda
- libffi=3.4.2=h7f98852_5
- libgcc-ng=13.2.0=h807b86a_2
- libgfortran-ng=13.2.0=h69a702a_3
- libgfortran5=13.2.0=ha4646dd_3
- libgomp=13.2.0=h807b86a_2
- libiconv=1.17=h166bdaf_0
- liblapack=3.9.0=20_linux64_openblas
- libmamba=1.5.3=haf1ee3a_0
- libmambapy=1.5.3=py310h2dafd23_0
- libnghttp2=1.58.0=h47da74e_0
- libnsl=2.0.1=hd590300_0
- libopenblas=0.3.25=pthreads_h413a1c8_0
- libsodium=1.0.18=h36c2ea0_1
- libsolv=0.7.27=hfc55251_0
- libsqlite=3.44.0=h2797004_0
- libssh2=1.11.0=h0841786_0
- libstdcxx-ng=13.2.0=h7e041cc_2
- libuuid=2.38.1=h0b41bf4_0
- libuv=1.46.0=hd590300_0
- libxml2=2.11.6=h232c23b_0
- libzlib=1.2.13=hd590300_5
- lz4-c=1.9.4=hcb278e6_0
- lzo=2.10=h516909a_1000
- markupsafe=2.1.3=py310h2372a71_1
- matplotlib-inline=0.1.6=pyhd8ed1ab_0
- menuinst=2.0.0=py310hff52083_1
- mistune=3.0.2=pyhd8ed1ab_0
- nb_conda=2.2.1=unix_6
- nb_conda_kernels=2.3.1=py310hff52083_2
- nbclient=0.8.0=pyhd8ed1ab_0
- nbconvert-core=7.10.0=pyhd8ed1ab_0
- nbformat=5.9.2=pyhd8ed1ab_0
- ncurses=6.4=h59595ed_2
- nest-asyncio=1.5.8=pyhd8ed1ab_0
- nodejs=20.8.1=h1990674_0
- notebook-shim=0.2.3=pyhd8ed1ab_0
- openssl=3.2.0=hd590300_1
- overrides=7.4.0=pyhd8ed1ab_0
- packaging=23.2=pyhd8ed1ab_0
- pandocfilters=1.5.0=pyhd8ed1ab_0
- parso=0.8.3=pyhd8ed1ab_0
- pexpect=4.8.0=pyh1a96a4e_2
- pickleshare=0.7.5=py_1003
- pip=23.3.1=pyhd8ed1ab_0
- pkgutil-resolve-name=1.3.10=pyhd8ed1ab_1
- platformdirs=3.11.0=pyhd8ed1ab_0
- pluggy=1.3.0=pyhd8ed1ab_0
- prometheus_client=0.18.0=pyhd8ed1ab_0
- prompt-toolkit=3.0.39=pyha770c72_0
- prompt_toolkit=3.0.39=hd8ed1ab_0
- ptyprocess=0.7.0=pyhd3deb0d_0
- pure_eval=0.2.2=pyhd8ed1ab_0
- pybind11-abi=4=hd8ed1ab_3
- pycosat=0.6.6=py310h2372a71_0
- pycparser=2.21=pyhd8ed1ab_0
- pygments=2.16.1=pyhd8ed1ab_0
- pyopenssl=23.3.0=pyhd8ed1ab_0
- pysocks=1.7.1=pyha2e5f31_6
- python=3.10.13=hd12c33a_0_cpython
- python-dateutil=2.8.2=pyhd8ed1ab_0
- python-fastjsonschema=2.18.1=pyhd8ed1ab_0
- python-json-logger=2.0.7=pyhd8ed1ab_0
- python_abi=3.10=4_cp310
- pytz=2023.3.post1=pyhd8ed1ab_0
- pyyaml=6.0.1=py310h2372a71_1
- readline=8.2=h8228510_1
- referencing=0.30.2=pyhd8ed1ab_0
- reproc=14.2.4.post0=hd590300_1
- reproc-cpp=14.2.4.post0=h59595ed_1
- requests=2.31.0=pyhd8ed1ab_0
- rfc3339-validator=0.1.4=pyhd8ed1ab_0
- rfc3986-validator=0.1.1=pyh9f0ad1d_0
- rpds-py=0.12.0=py310hcb5633a_0
- ruamel.yaml=0.17.40=py310h2372a71_0
- ruamel.yaml.clib=0.2.7=py310h2372a71_2
- send2trash=1.8.2=pyh41d4057_0
- setuptools=68.2.2=pyhd8ed1ab_0
- six=1.16.0=pyh6c4a22f_0
- sniffio=1.3.0=pyhd8ed1ab_0
- soupsieve=2.5=pyhd8ed1ab_1
- stack_data=0.6.2=pyhd8ed1ab_0
- terminado=0.17.1=pyh41d4057_0
- tinycss2=1.2.1=pyhd8ed1ab_0
- tk=8.6.13=noxft_h4845f30_101
- tomli=2.0.1=pyhd8ed1ab_0
- tornado=6.3.3=py310h2372a71_1
- tqdm=4.66.1=pyhd8ed1ab_0
- traitlets=5.13.0=pyhd8ed1ab_0
- truststore=0.8.0=pyhd8ed1ab_0
- types-python-dateutil=2.8.19.14=pyhd8ed1ab_0
- typing-extensions=4.8.0=hd8ed1ab_0
- typing_extensions=4.8.0=pyha770c72_0
- typing_utils=0.1.0=pyhd8ed1ab_0
- uri-template=1.3.0=pyhd8ed1ab_0
- wcwidth=0.2.9=pyhd8ed1ab_0
- webcolors=1.13=pyhd8ed1ab_0
- webencodings=0.5.1=pyhd8ed1ab_2
- websocket-client=1.6.4=pyhd8ed1ab_0
- wheel=0.41.3=pyhd8ed1ab_0
- xz=5.2.6=h166bdaf_0
- yaml=0.2.5=h7f98852_2
- yaml-cpp=0.8.0=h59595ed_0
- zeromq=4.3.5=h59595ed_0
- zipp=3.17.0=pyhd8ed1ab_0
- zlib=1.2.13=hd590300_5
- zstandard=0.22.0=py310h1275a96_0
- zstd=1.5.5=hfc55251_0
- pip:
- absl-py==2.0.0
- aiofiles==22.1.0
- aiohttp==3.8.6
- aiohttp-cors==0.7.0
- aiorwlock==1.3.0
- aiosignal==1.3.1
- aiosqlite==0.19.0
- anyio==3.7.1
- async-timeout==4.0.3
- backoff==2.2.1
- beatrix-jupyterlab==2023.113.222739
- blessed==1.20.0
- cachetools==5.3.2
- click==8.1.7
- cloud-tpu-client==0.10
- cloudpickle==3.0.0
- colorful==0.5.5
- contourpy==1.2.0
- cycler==0.12.1
- cython==3.0.5
- dacite==1.8.1
- db-dtypes==1.1.1
- deprecated==1.2.14
- distlib==0.3.7
- dm-tree==0.1.8
- docker==6.1.3
- docstring-parser==0.15
- farama-notifications==0.0.4
- fastapi==0.104.1
- filelock==3.13.1
- fonttools==4.44.0
- frozenlist==1.4.0
- fsspec==2023.10.0
- gcsfs==2023.10.0
- gitdb==4.0.11
- gitpython==3.1.40
- google-api-core==1.34.0
- google-api-python-client==1.8.0
- google-auth==2.23.4
- google-auth-httplib2==0.1.1
- google-auth-oauthlib==1.1.0
- google-cloud-aiplatform==1.36.0
- google-cloud-artifact-registry==1.9.0
- google-cloud-bigquery==3.13.0
- google-cloud-bigquery-storage==2.22.0
- google-cloud-core==2.3.3
- google-cloud-datastore==1.15.5
- google-cloud-language==2.11.1
- google-cloud-monitoring==2.16.0
- google-cloud-resource-manager==1.10.4
- google-cloud-storage==2.13.0
- google-crc32c==1.5.0
- google-resumable-media==2.6.0
- googleapis-common-protos==1.61.0
- gpustat==1.0.0
- greenlet==3.0.1
- grpc-google-iam-v1==0.12.6
- grpcio==1.59.2
- grpcio-status==1.48.2
- gymnasium==0.28.1
- h11==0.14.0
- htmlmin==0.1.12
- httplib2==0.22.0
- httptools==0.6.1
- imagehash==4.3.1
- imageio==2.32.0
- ipython-genutils==0.2.0
- ipython-sql==0.5.0
- ipywidgets==8.1.1
- jaraco-classes==3.3.0
- jax-jumpy==1.0.0
- jeepney==0.8.0
- joblib==1.3.2
- jupyter-client==7.4.9
- jupyter-http-over-ws==0.0.8
- jupyter-server-fileid==0.9.0
- jupyter-server-mathjax==0.2.6
- jupyter-server-proxy==4.1.0
- jupyter-server-ydoc==0.8.0
- jupyter-ydoc==0.2.5
- jupyterlab==3.6.6
- jupyterlab-git==0.44.0
- jupyterlab-widgets==3.0.9
- jupytext==1.15.2
- keyring==24.2.0
- keyrings-google-artifactregistry-auth==1.1.2
- kfp==2.4.0
- kfp-pipeline-spec==0.2.2
- kfp-server-api==2.0.3
- kiwisolver==1.4.5
- kubernetes==26.1.0
- lazy-loader==0.3
- llvmlite==0.41.1
- lz4==4.3.2
- markdown-it-py==3.0.0
- matplotlib==3.7.3
- mdit-py-plugins==0.4.0
- mdurl==0.1.2
- more-itertools==10.1.0
- msgpack==1.0.7
- multidict==6.0.4
- multimethod==1.10
- nbclassic==1.0.0
- nbdime==3.2.0
- networkx==3.2.1
- notebook==6.5.6
- notebook-executor==0.2
- numba==0.58.1
- numpy==1.25.2
- nvidia-ml-py==11.495.46
- oauth2client==4.1.3
- oauthlib==3.2.2
- opencensus==0.11.3
- opencensus-context==0.1.3
- opentelemetry-api==1.20.0
- opentelemetry-exporter-otlp==1.20.0
- opentelemetry-exporter-otlp-proto-common==1.20.0
- opentelemetry-exporter-otlp-proto-grpc==1.20.0
- opentelemetry-exporter-otlp-proto-http==1.20.0
- opentelemetry-proto==1.20.0
- opentelemetry-sdk==1.20.0
- opentelemetry-semantic-conventions==0.41b0
- pandas==2.0.3
- pandas-profiling==3.6.6
- papermill==2.5.0
- patsy==0.5.3
- phik==0.12.3
- pillow==10.0.1
- plotly==5.18.0
- prettytable==3.9.0
- proto-plus==1.22.3
- protobuf==3.20.3
- psutil==5.9.3
- py-spy==0.3.14
- pyarrow==14.0.0
- pyasn1==0.5.0
- pyasn1-modules==0.3.0
- pydantic==1.10.13
- pyjwt==2.8.0
- pyparsing==3.1.1
- python-dotenv==1.0.0
- pywavelets==1.4.1
- pyzmq==24.0.1
- ray==2.8.0
- ray-cpp==2.8.0
- requests-oauthlib==1.3.1
- requests-toolbelt==0.10.1
- retrying==1.3.4
- rich==13.6.0
- scikit-image==0.22.0
- scikit-learn==1.3.2
- scipy==1.11.3
- seaborn==0.12.2
- secretstorage==3.3.3
- shapely==2.0.2
- simpervisor==1.0.0
- smart-open==6.4.0
- smmap==5.0.1
- sqlalchemy==2.0.23
- sqlparse==0.4.4
- stack-data==0.6.3
- starlette==0.27.0
- statsmodels==0.14.0
- tabulate==0.9.0
- tangled-up-in-unicode==0.2.0
- tenacity==8.2.3
- tensorboardx==2.6.2.2
- threadpoolctl==3.2.0
- tifffile==2023.9.26
- toml==0.10.2
- typeguard==4.1.5
- typer==0.9.0
- tzdata==2023.3
- uritemplate==3.0.1
- urllib3==1.26.18
- uvicorn==0.24.0
- uvloop==0.19.0
- virtualenv==20.21.0
- visions==0.7.5
- watchfiles==0.21.0
- websockets==12.0
- widgetsnbextension==4.0.9
- wordcloud==1.9.2
- wrapt==1.15.0
- y-py==0.6.2
- yarl==1.9.2
- ydata-profiling==4.6.0
- ypy-websocket==0.8.4
prefix: /opt/conda
pip_packages
accelerate==0.27.0
aiohttp==3.9.3
aioprometheus==23.12.0
aiosignal==1.3.1
anyio==4.2.0
appdirs==1.4.4
asgiref==3.7.2
async-timeout==4.0.3
attrs==23.2.0
bentoml==1.1.11
bitsandbytes==0.41.3.post2
build==0.10.0
cattrs==23.1.2
certifi==2024.2.2
charset-normalizer==3.3.2
circus==0.18.0
click==8.1.7
click-option-group==0.5.6
cloudpickle==3.0.0
coloredlogs==15.0.1
contextlib2==21.6.0
cuda-python==12.3.0
datasets==2.17.0
deepmerge==1.1.1
Deprecated==1.2.14
dill==0.3.8
distlib==0.3.8
distro==1.9.0
einops==0.7.0
exceptiongroup==1.2.0
fastapi==0.109.2
fastcore==1.5.29
filelock==3.13.1
filetype==1.2.0
frozenlist==1.4.1
fs==2.4.16
fsspec==2023.10.0
ghapi==1.0.4
grpcio==1.60.1
h11==0.14.0
httpcore==1.0.2
httptools==0.6.1
httpx==0.26.0
huggingface-hub==0.20.3
humanfriendly==10.0
idna==3.6
importlib-metadata==6.11.0
inflection==0.5.1
Jinja2==3.1.3
jsonschema==4.21.1
jsonschema-specifications==2023.12.1
markdown-it-py==3.0.0
MarkupSafe==2.1.5
mdurl==0.1.2
mpmath==1.3.0
msgpack==1.0.7
multidict==6.0.5
multiprocess==0.70.16
mypy-extensions==1.0.0
networkx==3.2.1
ninja==1.11.1.1
numpy==1.26.4
nvidia-cublas-cu12==12.1.3.1
nvidia-cuda-cupti-cu12==12.1.105
nvidia-cuda-nvrtc-cu12==12.1.105
nvidia-cuda-runtime-cu12==12.1.105
nvidia-cudnn-cu12==8.9.2.26
nvidia-cufft-cu12==11.0.2.54
nvidia-curand-cu12==10.3.2.106
nvidia-cusolver-cu12==11.4.5.107
nvidia-cusparse-cu12==12.1.0.106
nvidia-ml-py==11.525.150
nvidia-nccl-cu12==2.18.1
nvidia-nvjitlink-cu12==12.3.101
nvidia-nvtx-cu12==12.1.105
openllm==0.4.35
openllm-client==0.4.44
openllm-core==0.4.44
opentelemetry-api==1.20.0
opentelemetry-instrumentation==0.41b0
opentelemetry-instrumentation-aiohttp-client==0.41b0
opentelemetry-instrumentation-asgi==0.41b0
opentelemetry-sdk==1.20.0
opentelemetry-semantic-conventions==0.41b0
opentelemetry-util-http==0.41b0
optimum==1.16.2
orjson==3.9.13
packaging==23.2
pandas==2.2.0
pathspec==0.12.1
pillow==10.2.0
pip-requirements-parser==32.0.1
pip-tools==7.3.0
platformdirs==4.2.0
prometheus-client==0.19.0
protobuf==4.25.2
psutil==5.9.8
pyarrow==15.0.0
pyarrow-hotfix==0.6
pydantic==1.10.13
Pygments==2.17.2
pyparsing==3.1.1
pyproject_hooks==1.0.0
python-dateutil==2.8.2
python-dotenv==1.0.1
python-json-logger==2.0.7
python-multipart==0.0.9
pytz==2024.1
PyYAML==6.0.1
pyzmq==25.1.2
quantile-python==1.1
ray==2.6.0
referencing==0.33.0
regex==2023.12.25
requests==2.31.0
rich==13.7.0
rpds-py==0.17.1
safetensors==0.4.2
schema==0.7.5
scipy==1.12.0
sentencepiece==0.1.99
simple-di==0.1.5
six==1.16.0
sniffio==1.3.0
starlette==0.36.3
sympy==1.12
tokenizers==0.13.3
tomli==2.0.1
torch==2.1.2
tornado==6.4
tqdm==4.66.2
transformers @ git+https://github.com/huggingface/transformers@e51d7ac70ab8f3e69d3659226aa838308a668238
triton==2.1.0
typing_extensions==4.9.0
tzdata==2024.1
urllib3==2.2.0
uvicorn==0.27.1
uvloop==0.19.0
virtualenv==20.25.0
vllm==0.2.7
watchfiles==0.21.0
websockets==12.0
wrapt==1.16.0
xformers==0.0.23.post1
xxhash==3.4.1
yarl==1.9.4
zipp==3.17.0
System information (Optional)
No response