Failed to start server in 'docker' mode

Question

Failed to start server in 'docker' mode

IAINATDBI opened this issue 5 months ago · comments

Hi

I have created a Dockerfile (continued from this repo):

FROM nvcr.io/nvidia/tritonserver:24.02-trtllm-python-py3
# Setup vLLM Triton backend
RUN mkdir -p /opt/tritonserver/backends/vllm && \
    wget -P /opt/tritonserver/backends/vllm https://raw.githubusercontent.com/triton-inference-server/vllm_backend/main/src/model.py

# TRT-LLM engine build dependencies
# NOTE: torch 2.2.0 has a symbol conflict, so WAR is to install 2.1.2
RUN pip install \
  "psutil" \
  "pynvml>=11.5.0" \
  "torch==2.1.2" \
  --extra-index-url https://pypi.nvidia.com/ "tensorrt-llm==0.8.0"

# vLLM runtime dependencies
RUN pip install \
  "vllm==0.3.0" \
  "transformers>=4.37.0" \
  "accelerate==0.26.1"

# TODO: Install Triton CLI in this image
RUN git clone https://github.com/triton-inference-server/triton_cli.git && \
    cd triton_cli && \
    pip install .

RUN triton remove -m all && triton import -m llama-2-7b  --backend tensorrtllm

CMD triton start

I have referenced this from my compose yaml:

  tritonserver:
    container_name: tritonserver
    build:
      context: ./django/Triton
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]
    shm_size: 1g

And the container exits with following (from container logs):

== Triton Inference Server ==
=============================

NVIDIA Release 24.02 (build 83572707)
Triton Server Version 2.43.0

Copyright (c) 2018-2023, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

NOTE: CUDA Forward Compatibility mode ENABLED.
  Using CUDA 12.3 driver version 545.23.08 with kernel driver version 535.161.07.
  See https://docs.nvidia.com/deploy/cuda-compatibility/ for details.

triton - ERROR - Failed to start server. Errors: ["Failed to start server in 'local' mode. 28:17 : 'max_batch_size: ${triton_max_batch_size}': Couldn't parse integer: $", "Failed to start server in 'docker' mode. Error while fetching server API version: ('Connection aborted.', FileNotFoundError(2, 'No such file or directory'))"]

Any thoughts?

Cheers.

Ryan McCormick · Answer 1 · Tue Mar 26 2024 01:28:25 GMT+0800 (China Standard Time)

Hi @IAINATDBI,

This first error looks like the root cause:

"Failed to start server in 'local' mode. 28:17 : 'max_batch_size: ${triton_max_batch_size}': Couldn't parse integer: $"

Generally this means that the model config template did not get filled in, like this line.

The template should've been filled in if triton import -m llama-2-7b --backend tensorrtllm succeeded. I would double check that step is succeeding correctly during docker build, or enter that container interactively and poke around to see the model repository inside looks correct and is where you expect it would be.

IAINATDBI · Answer 2 · Tue Mar 26 2024 06:55:17 GMT+0800 (China Standard Time)

Hi @rmccorm4 - will try and revert back. Thank you.

Cheers

Ryan McCormick · Answer 3 · Tue Mar 26 2024 07:14:57 GMT+0800 (China Standard Time)

Sounds good, I'll close this for now, but feel free to re-open if you get more details and have more questions.