bentoml / OpenLLM

Run any open-source LLMs, such as Llama 2, Mistral, as OpenAI compatible API endpoint in the cloud.

Home Page:https://bentoml.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

bug: Requests with "use_beam_search: true" fail with an unclear exception message.

yan-virin opened this issue · comments

Describe the bug

I am using this llm config in the json request:

"llm_config": {
"num_beams": 5,
"use_beam_search": true
}

and I am getting an unclear exception:

chatx-gdch-openllm-86d68dd84f-r8png RuntimeError: Exception caught during generation: Response payload is not completed

To reproduce

Use the following json for an http request:
{
"prompt": "...........",
"llm_config": {
"num_beams": 5,
"use_beam_search": true
}
}

Logs

chatx-gdch-openllm-86d68dd84f-r8png Traceback (most recent call last):                                                                                                                                 
chatx-gdch-openllm-86d68dd84f-r8png   File "/usr/local/lib/python3.11/dist-packages/bentoml/_internal/server/http_app.py", line 341, in api_func                                                       │
chatx-gdch-openllm-86d68dd84f-r8png     output = await api.func(*args)                                                                                                                                
chatx-gdch-openllm-86d68dd84f-r8png              ^^^^^^^^^^^^^^^^^^^^^                                                                                                                                 
chatx-gdch-openllm-86d68dd84f-r8png   File "/home/bentoml/bento/src/generated_llama_service.py", line 23, in generate_v1                                                                               
chatx-gdch-openllm-86d68dd84f-r8png     return (await llm.generate(**llm_model_class(**input_dict).model_dump())).model_dump()                                                                         
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                      
chatx-gdch-openllm-86d68dd84f-r8png   File "/usr/local/lib/python3.11/dist-packages/openllm/_llm.py", line 55, in generate                                                                             
chatx-gdch-openllm-86d68dd84f-r8png     async for result in self.generate_iterator(                                                                                                                    
chatx-gdch-openllm-86d68dd84f-r8png   File "/usr/local/lib/python3.11/dist-packages/openllm/_llm.py", line 125, in generate_iterator                                                                   
chatx-gdch-openllm-86d68dd84f-r8png     raise RuntimeError(f'Exception caught during generation: {err}') from err

Environment

Environment variable

BENTOML_DEBUG=''
BENTOML_QUIET=''
BENTOML_BUNDLE_LOCAL_BUILD=''
BENTOML_DO_NOT_TRACK=''
BENTOML_CONFIG=''
BENTOML_CONFIG_OPTIONS=''
BENTOML_PORT=''
BENTOML_HOST=''
BENTOML_API_WORKERS=''

System information

bentoml: 1.1.11
python: 3.10.13
platform: Linux-5.10.0-27-cloud-amd64-x86_64-with-glibc2.31
uid_gid: 1001:1002
conda: 23.11.0
in_conda_env: True

conda_packages
name: base
channels:
  - file:///tmp/conda-pkgs
  - conda-forge
  - defaults
dependencies:
  - _libgcc_mutex=0.1=conda_forge
  - _openmp_mutex=4.5=2_gnu
  - archspec=0.2.1=pyhd8ed1ab_1
  - argon2-cffi=23.1.0=pyhd8ed1ab_0
  - argon2-cffi-bindings=21.2.0=py310h2372a71_4
  - arrow=1.3.0=pyhd8ed1ab_0
  - asttokens=2.4.1=pyhd8ed1ab_0
  - async-lru=2.0.4=pyhd8ed1ab_0
  - attrs=23.1.0=pyh71513ae_1
  - babel=2.13.1=pyhd8ed1ab_0
  - backports=1.0=pyhd8ed1ab_3
  - backports.functools_lru_cache=1.6.5=pyhd8ed1ab_0
  - beautifulsoup4=4.12.2=pyha770c72_0
  - bleach=6.1.0=pyhd8ed1ab_0
  - boltons=23.0.0=pyhd8ed1ab_0
  - brotli-python=1.1.0=py310hc6cd4ac_1
  - bzip2=1.0.8=h7f98852_4
  - c-ares=1.23.0=hd590300_0
  - ca-certificates=2023.11.17=hbcca054_0
  - cached-property=1.5.2=hd8ed1ab_1
  - cached_property=1.5.2=pyha770c72_1
  - certifi=2023.11.17=pyhd8ed1ab_0
  - cffi=1.16.0=py310h2fee648_0
  - charset-normalizer=3.3.2=pyhd8ed1ab_0
  - colorama=0.4.6=pyhd8ed1ab_0
  - comm=0.1.4=pyhd8ed1ab_0
  - conda=23.11.0=py310hff52083_1
  - conda-libmamba-solver=23.11.1=pyhd8ed1ab_0
  - conda-package-handling=2.2.0=pyh38be061_0
  - conda-package-streaming=0.9.0=pyhd8ed1ab_0
  - cryptography=41.0.5=py310h75e40e8_0
  - cudatoolkit=11.8.0=h4ba93d1_12
  - debugpy=1.8.0=py310hc6cd4ac_1
  - decorator=5.1.1=pyhd8ed1ab_0
  - defusedxml=0.7.1=pyhd8ed1ab_0
  - distro=1.8.0=pyhd8ed1ab_0
  - dlenv-base=1.0.20231106=py310_0
  - entrypoints=0.4=pyhd8ed1ab_0
  - exceptiongroup=1.1.3=pyhd8ed1ab_0
  - executing=2.0.1=pyhd8ed1ab_0
  - faiss=1.7.4=py310cuda112hae2f2aa_0_cuda
  - faiss-gpu=1.7.4=h788eb59_0
  - fmt=9.1.0=h924138e_0
  - fqdn=1.5.1=pyhd8ed1ab_0
  - icu=73.2=h59595ed_0
  - idna=3.4=pyhd8ed1ab_0
  - importlib-metadata=6.8.0=pyha770c72_0
  - importlib_metadata=6.8.0=hd8ed1ab_0
  - importlib_resources=6.1.0=pyhd8ed1ab_0
  - ipykernel=6.26.0=pyhf8b6a83_0
  - ipython=8.17.2=pyh41d4057_0
  - isoduration=20.11.0=pyhd8ed1ab_0
  - jedi=0.19.1=pyhd8ed1ab_0
  - jinja2=3.1.2=pyhd8ed1ab_1
  - json5=0.9.14=pyhd8ed1ab_0
  - jsonpatch=1.33=pyhd8ed1ab_0
  - jsonpointer=2.4=py310hff52083_3
  - jsonschema=4.19.2=pyhd8ed1ab_0
  - jsonschema-specifications=2023.7.1=pyhd8ed1ab_0
  - jsonschema-with-format-nongpl=4.19.2=pyhd8ed1ab_0
  - jupyter-lsp=2.2.0=pyhd8ed1ab_0
  - jupyter_client=8.5.0=pyhd8ed1ab_0
  - jupyter_core=5.5.0=py310hff52083_0
  - jupyter_events=0.8.0=pyhd8ed1ab_0
  - jupyter_server=2.9.1=pyhd8ed1ab_0
  - jupyter_server_terminals=0.4.4=pyhd8ed1ab_1
  - jupyterlab_pygments=0.2.2=pyhd8ed1ab_0
  - jupyterlab_server=2.25.0=pyhd8ed1ab_0
  - keyutils=1.6.1=h166bdaf_0
  - krb5=1.20.1=h81ceb04_0
  - ld_impl_linux-64=2.40=h41732ed_0
  - libarchive=3.6.2=h039dbb9_1
  - libblas=3.9.0=20_linux64_openblas
  - libcblas=3.9.0=20_linux64_openblas
  - libcurl=8.4.0=h251f7ec_1
  - libedit=3.1.20191231=he28a2e2_2
  - libev=4.33=h516909a_1
  - libfaiss=1.7.4=cuda112hb18a002_0_cuda
  - libfaiss-avx2=1.7.4=cuda112h1234567_0_cuda
  - libffi=3.4.2=h7f98852_5
  - libgcc-ng=13.2.0=h807b86a_2
  - libgfortran-ng=13.2.0=h69a702a_3
  - libgfortran5=13.2.0=ha4646dd_3
  - libgomp=13.2.0=h807b86a_2
  - libiconv=1.17=h166bdaf_0
  - liblapack=3.9.0=20_linux64_openblas
  - libmamba=1.5.3=haf1ee3a_0
  - libmambapy=1.5.3=py310h2dafd23_0
  - libnghttp2=1.58.0=h47da74e_0
  - libnsl=2.0.1=hd590300_0
  - libopenblas=0.3.25=pthreads_h413a1c8_0
  - libsodium=1.0.18=h36c2ea0_1
  - libsolv=0.7.27=hfc55251_0
  - libsqlite=3.44.0=h2797004_0
  - libssh2=1.11.0=h0841786_0
  - libstdcxx-ng=13.2.0=h7e041cc_2
  - libuuid=2.38.1=h0b41bf4_0
  - libuv=1.46.0=hd590300_0
  - libxml2=2.11.6=h232c23b_0
  - libzlib=1.2.13=hd590300_5
  - lz4-c=1.9.4=hcb278e6_0
  - lzo=2.10=h516909a_1000
  - markupsafe=2.1.3=py310h2372a71_1
  - matplotlib-inline=0.1.6=pyhd8ed1ab_0
  - menuinst=2.0.0=py310hff52083_1
  - mistune=3.0.2=pyhd8ed1ab_0
  - nb_conda=2.2.1=unix_6
  - nb_conda_kernels=2.3.1=py310hff52083_2
  - nbclient=0.8.0=pyhd8ed1ab_0
  - nbconvert-core=7.10.0=pyhd8ed1ab_0
  - nbformat=5.9.2=pyhd8ed1ab_0
  - ncurses=6.4=h59595ed_2
  - nest-asyncio=1.5.8=pyhd8ed1ab_0
  - nodejs=20.8.1=h1990674_0
  - notebook-shim=0.2.3=pyhd8ed1ab_0
  - openssl=3.2.0=hd590300_1
  - overrides=7.4.0=pyhd8ed1ab_0
  - packaging=23.2=pyhd8ed1ab_0
  - pandocfilters=1.5.0=pyhd8ed1ab_0
  - parso=0.8.3=pyhd8ed1ab_0
  - pexpect=4.8.0=pyh1a96a4e_2
  - pickleshare=0.7.5=py_1003
  - pip=23.3.1=pyhd8ed1ab_0
  - pkgutil-resolve-name=1.3.10=pyhd8ed1ab_1
  - platformdirs=3.11.0=pyhd8ed1ab_0
  - pluggy=1.3.0=pyhd8ed1ab_0
  - prometheus_client=0.18.0=pyhd8ed1ab_0
  - prompt-toolkit=3.0.39=pyha770c72_0
  - prompt_toolkit=3.0.39=hd8ed1ab_0
  - ptyprocess=0.7.0=pyhd3deb0d_0
  - pure_eval=0.2.2=pyhd8ed1ab_0
  - pybind11-abi=4=hd8ed1ab_3
  - pycosat=0.6.6=py310h2372a71_0
  - pycparser=2.21=pyhd8ed1ab_0
  - pygments=2.16.1=pyhd8ed1ab_0
  - pyopenssl=23.3.0=pyhd8ed1ab_0
  - pysocks=1.7.1=pyha2e5f31_6
  - python=3.10.13=hd12c33a_0_cpython
  - python-dateutil=2.8.2=pyhd8ed1ab_0
  - python-fastjsonschema=2.18.1=pyhd8ed1ab_0
  - python-json-logger=2.0.7=pyhd8ed1ab_0
  - python_abi=3.10=4_cp310
  - pytz=2023.3.post1=pyhd8ed1ab_0
  - pyyaml=6.0.1=py310h2372a71_1
  - readline=8.2=h8228510_1
  - referencing=0.30.2=pyhd8ed1ab_0
  - reproc=14.2.4.post0=hd590300_1
  - reproc-cpp=14.2.4.post0=h59595ed_1
  - requests=2.31.0=pyhd8ed1ab_0
  - rfc3339-validator=0.1.4=pyhd8ed1ab_0
  - rfc3986-validator=0.1.1=pyh9f0ad1d_0
  - rpds-py=0.12.0=py310hcb5633a_0
  - ruamel.yaml=0.17.40=py310h2372a71_0
  - ruamel.yaml.clib=0.2.7=py310h2372a71_2
  - send2trash=1.8.2=pyh41d4057_0
  - setuptools=68.2.2=pyhd8ed1ab_0
  - six=1.16.0=pyh6c4a22f_0
  - sniffio=1.3.0=pyhd8ed1ab_0
  - soupsieve=2.5=pyhd8ed1ab_1
  - stack_data=0.6.2=pyhd8ed1ab_0
  - terminado=0.17.1=pyh41d4057_0
  - tinycss2=1.2.1=pyhd8ed1ab_0
  - tk=8.6.13=noxft_h4845f30_101
  - tomli=2.0.1=pyhd8ed1ab_0
  - tornado=6.3.3=py310h2372a71_1
  - tqdm=4.66.1=pyhd8ed1ab_0
  - traitlets=5.13.0=pyhd8ed1ab_0
  - truststore=0.8.0=pyhd8ed1ab_0
  - types-python-dateutil=2.8.19.14=pyhd8ed1ab_0
  - typing-extensions=4.8.0=hd8ed1ab_0
  - typing_extensions=4.8.0=pyha770c72_0
  - typing_utils=0.1.0=pyhd8ed1ab_0
  - uri-template=1.3.0=pyhd8ed1ab_0
  - wcwidth=0.2.9=pyhd8ed1ab_0
  - webcolors=1.13=pyhd8ed1ab_0
  - webencodings=0.5.1=pyhd8ed1ab_2
  - websocket-client=1.6.4=pyhd8ed1ab_0
  - wheel=0.41.3=pyhd8ed1ab_0
  - xz=5.2.6=h166bdaf_0
  - yaml=0.2.5=h7f98852_2
  - yaml-cpp=0.8.0=h59595ed_0
  - zeromq=4.3.5=h59595ed_0
  - zipp=3.17.0=pyhd8ed1ab_0
  - zlib=1.2.13=hd590300_5
  - zstandard=0.22.0=py310h1275a96_0
  - zstd=1.5.5=hfc55251_0
  - pip:
      - absl-py==2.0.0
      - aiofiles==22.1.0
      - aiohttp==3.8.6
      - aiohttp-cors==0.7.0
      - aiorwlock==1.3.0
      - aiosignal==1.3.1
      - aiosqlite==0.19.0
      - anyio==3.7.1
      - async-timeout==4.0.3
      - backoff==2.2.1
      - beatrix-jupyterlab==2023.113.222739
      - blessed==1.20.0
      - cachetools==5.3.2
      - click==8.1.7
      - cloud-tpu-client==0.10
      - cloudpickle==3.0.0
      - colorful==0.5.5
      - contourpy==1.2.0
      - cycler==0.12.1
      - cython==3.0.5
      - dacite==1.8.1
      - db-dtypes==1.1.1
      - deprecated==1.2.14
      - distlib==0.3.7
      - dm-tree==0.1.8
      - docker==6.1.3
      - docstring-parser==0.15
      - farama-notifications==0.0.4
      - fastapi==0.104.1
      - filelock==3.13.1
      - fonttools==4.44.0
      - frozenlist==1.4.0
      - fsspec==2023.10.0
      - gcsfs==2023.10.0
      - gitdb==4.0.11
      - gitpython==3.1.40
      - google-api-core==1.34.0
      - google-api-python-client==1.8.0
      - google-auth==2.23.4
      - google-auth-httplib2==0.1.1
      - google-auth-oauthlib==1.1.0
      - google-cloud-aiplatform==1.36.0
      - google-cloud-artifact-registry==1.9.0
      - google-cloud-bigquery==3.13.0
      - google-cloud-bigquery-storage==2.22.0
      - google-cloud-core==2.3.3
      - google-cloud-datastore==1.15.5
      - google-cloud-language==2.11.1
      - google-cloud-monitoring==2.16.0
      - google-cloud-resource-manager==1.10.4
      - google-cloud-storage==2.13.0
      - google-crc32c==1.5.0
      - google-resumable-media==2.6.0
      - googleapis-common-protos==1.61.0
      - gpustat==1.0.0
      - greenlet==3.0.1
      - grpc-google-iam-v1==0.12.6
      - grpcio==1.59.2
      - grpcio-status==1.48.2
      - gymnasium==0.28.1
      - h11==0.14.0
      - htmlmin==0.1.12
      - httplib2==0.22.0
      - httptools==0.6.1
      - imagehash==4.3.1
      - imageio==2.32.0
      - ipython-genutils==0.2.0
      - ipython-sql==0.5.0
      - ipywidgets==8.1.1
      - jaraco-classes==3.3.0
      - jax-jumpy==1.0.0
      - jeepney==0.8.0
      - joblib==1.3.2
      - jupyter-client==7.4.9
      - jupyter-http-over-ws==0.0.8
      - jupyter-server-fileid==0.9.0
      - jupyter-server-mathjax==0.2.6
      - jupyter-server-proxy==4.1.0
      - jupyter-server-ydoc==0.8.0
      - jupyter-ydoc==0.2.5
      - jupyterlab==3.6.6
      - jupyterlab-git==0.44.0
      - jupyterlab-widgets==3.0.9
      - jupytext==1.15.2
      - keyring==24.2.0
      - keyrings-google-artifactregistry-auth==1.1.2
      - kfp==2.4.0
      - kfp-pipeline-spec==0.2.2
      - kfp-server-api==2.0.3
      - kiwisolver==1.4.5
      - kubernetes==26.1.0
      - lazy-loader==0.3
      - llvmlite==0.41.1
      - lz4==4.3.2
      - markdown-it-py==3.0.0
      - matplotlib==3.7.3
      - mdit-py-plugins==0.4.0
      - mdurl==0.1.2
      - more-itertools==10.1.0
      - msgpack==1.0.7
      - multidict==6.0.4
      - multimethod==1.10
      - nbclassic==1.0.0
      - nbdime==3.2.0
      - networkx==3.2.1
      - notebook==6.5.6
      - notebook-executor==0.2
      - numba==0.58.1
      - numpy==1.25.2
      - nvidia-ml-py==11.495.46
      - oauth2client==4.1.3
      - oauthlib==3.2.2
      - opencensus==0.11.3
      - opencensus-context==0.1.3
      - opentelemetry-api==1.20.0
      - opentelemetry-exporter-otlp==1.20.0
      - opentelemetry-exporter-otlp-proto-common==1.20.0
      - opentelemetry-exporter-otlp-proto-grpc==1.20.0
      - opentelemetry-exporter-otlp-proto-http==1.20.0
      - opentelemetry-proto==1.20.0
      - opentelemetry-sdk==1.20.0
      - opentelemetry-semantic-conventions==0.41b0
      - pandas==2.0.3
      - pandas-profiling==3.6.6
      - papermill==2.5.0
      - patsy==0.5.3
      - phik==0.12.3
      - pillow==10.0.1
      - plotly==5.18.0
      - prettytable==3.9.0
      - proto-plus==1.22.3
      - protobuf==3.20.3
      - psutil==5.9.3
      - py-spy==0.3.14
      - pyarrow==14.0.0
      - pyasn1==0.5.0
      - pyasn1-modules==0.3.0
      - pydantic==1.10.13
      - pyjwt==2.8.0
      - pyparsing==3.1.1
      - python-dotenv==1.0.0
      - pywavelets==1.4.1
      - pyzmq==24.0.1
      - ray==2.8.0
      - ray-cpp==2.8.0
      - requests-oauthlib==1.3.1
      - requests-toolbelt==0.10.1
      - retrying==1.3.4
      - rich==13.6.0
      - scikit-image==0.22.0
      - scikit-learn==1.3.2
      - scipy==1.11.3
      - seaborn==0.12.2
      - secretstorage==3.3.3
      - shapely==2.0.2
      - simpervisor==1.0.0
      - smart-open==6.4.0
      - smmap==5.0.1
      - sqlalchemy==2.0.23
      - sqlparse==0.4.4
      - stack-data==0.6.3
      - starlette==0.27.0
      - statsmodels==0.14.0
      - tabulate==0.9.0
      - tangled-up-in-unicode==0.2.0
      - tenacity==8.2.3
      - tensorboardx==2.6.2.2
      - threadpoolctl==3.2.0
      - tifffile==2023.9.26
      - toml==0.10.2
      - typeguard==4.1.5
      - typer==0.9.0
      - tzdata==2023.3
      - uritemplate==3.0.1
      - urllib3==1.26.18
      - uvicorn==0.24.0
      - uvloop==0.19.0
      - virtualenv==20.21.0
      - visions==0.7.5
      - watchfiles==0.21.0
      - websockets==12.0
      - widgetsnbextension==4.0.9
      - wordcloud==1.9.2
      - wrapt==1.15.0
      - y-py==0.6.2
      - yarl==1.9.2
      - ydata-profiling==4.6.0
      - ypy-websocket==0.8.4
prefix: /opt/conda
pip_packages
accelerate==0.27.0
aiohttp==3.9.3
aioprometheus==23.12.0
aiosignal==1.3.1
anyio==4.2.0
appdirs==1.4.4
asgiref==3.7.2
async-timeout==4.0.3
attrs==23.2.0
bentoml==1.1.11
bitsandbytes==0.41.3.post2
build==0.10.0
cattrs==23.1.2
certifi==2024.2.2
charset-normalizer==3.3.2
circus==0.18.0
click==8.1.7
click-option-group==0.5.6
cloudpickle==3.0.0
coloredlogs==15.0.1
contextlib2==21.6.0
cuda-python==12.3.0
datasets==2.17.0
deepmerge==1.1.1
Deprecated==1.2.14
dill==0.3.8
distlib==0.3.8
distro==1.9.0
einops==0.7.0
exceptiongroup==1.2.0
fastapi==0.109.2
fastcore==1.5.29
filelock==3.13.1
filetype==1.2.0
frozenlist==1.4.1
fs==2.4.16
fsspec==2023.10.0
ghapi==1.0.4
grpcio==1.60.1
h11==0.14.0
httpcore==1.0.2
httptools==0.6.1
httpx==0.26.0
huggingface-hub==0.20.3
humanfriendly==10.0
idna==3.6
importlib-metadata==6.11.0
inflection==0.5.1
Jinja2==3.1.3
jsonschema==4.21.1
jsonschema-specifications==2023.12.1
markdown-it-py==3.0.0
MarkupSafe==2.1.5
mdurl==0.1.2
mpmath==1.3.0
msgpack==1.0.7
multidict==6.0.5
multiprocess==0.70.16
mypy-extensions==1.0.0
networkx==3.2.1
ninja==1.11.1.1
numpy==1.26.4
nvidia-cublas-cu12==12.1.3.1
nvidia-cuda-cupti-cu12==12.1.105
nvidia-cuda-nvrtc-cu12==12.1.105
nvidia-cuda-runtime-cu12==12.1.105
nvidia-cudnn-cu12==8.9.2.26
nvidia-cufft-cu12==11.0.2.54
nvidia-curand-cu12==10.3.2.106
nvidia-cusolver-cu12==11.4.5.107
nvidia-cusparse-cu12==12.1.0.106
nvidia-ml-py==11.525.150
nvidia-nccl-cu12==2.18.1
nvidia-nvjitlink-cu12==12.3.101
nvidia-nvtx-cu12==12.1.105
openllm==0.4.35
openllm-client==0.4.44
openllm-core==0.4.44
opentelemetry-api==1.20.0
opentelemetry-instrumentation==0.41b0
opentelemetry-instrumentation-aiohttp-client==0.41b0
opentelemetry-instrumentation-asgi==0.41b0
opentelemetry-sdk==1.20.0
opentelemetry-semantic-conventions==0.41b0
opentelemetry-util-http==0.41b0
optimum==1.16.2
orjson==3.9.13
packaging==23.2
pandas==2.2.0
pathspec==0.12.1
pillow==10.2.0
pip-requirements-parser==32.0.1
pip-tools==7.3.0
platformdirs==4.2.0
prometheus-client==0.19.0
protobuf==4.25.2
psutil==5.9.8
pyarrow==15.0.0
pyarrow-hotfix==0.6
pydantic==1.10.13
Pygments==2.17.2
pyparsing==3.1.1
pyproject_hooks==1.0.0
python-dateutil==2.8.2
python-dotenv==1.0.1
python-json-logger==2.0.7
python-multipart==0.0.9
pytz==2024.1
PyYAML==6.0.1
pyzmq==25.1.2
quantile-python==1.1
ray==2.6.0
referencing==0.33.0
regex==2023.12.25
requests==2.31.0
rich==13.7.0
rpds-py==0.17.1
safetensors==0.4.2
schema==0.7.5
scipy==1.12.0
sentencepiece==0.1.99
simple-di==0.1.5
six==1.16.0
sniffio==1.3.0
starlette==0.36.3
sympy==1.12
tokenizers==0.13.3
tomli==2.0.1
torch==2.1.2
tornado==6.4
tqdm==4.66.2
transformers @ git+https://github.com/huggingface/transformers@e51d7ac70ab8f3e69d3659226aa838308a668238
triton==2.1.0
typing_extensions==4.9.0
tzdata==2024.1
urllib3==2.2.0
uvicorn==0.27.1
uvloop==0.19.0
virtualenv==20.25.0
vllm==0.2.7
watchfiles==0.21.0
websockets==12.0
wrapt==1.16.0
xformers==0.0.23.post1
xxhash==3.4.1
yarl==1.9.4
zipp==3.17.0

System information (Optional)

No response