triton-inference-server / triton_cli

Triton CLI is an open source command line interface that enables users to create, deploy, and profile models served by the Triton Inference Server.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Failed to start server in 'docker' mode

IAINATDBI opened this issue · comments

Hi

I have created a Dockerfile (continued from this repo):

FROM nvcr.io/nvidia/tritonserver:24.02-trtllm-python-py3
# Setup vLLM Triton backend
RUN mkdir -p /opt/tritonserver/backends/vllm && \
    wget -P /opt/tritonserver/backends/vllm https://raw.githubusercontent.com/triton-inference-server/vllm_backend/main/src/model.py

# TRT-LLM engine build dependencies
# NOTE: torch 2.2.0 has a symbol conflict, so WAR is to install 2.1.2
RUN pip install \
  "psutil" \
  "pynvml>=11.5.0" \
  "torch==2.1.2" \
  --extra-index-url https://pypi.nvidia.com/ "tensorrt-llm==0.8.0"

# vLLM runtime dependencies
RUN pip install \
  "vllm==0.3.0" \
  "transformers>=4.37.0" \
  "accelerate==0.26.1"

# TODO: Install Triton CLI in this image
RUN git clone https://github.com/triton-inference-server/triton_cli.git && \
    cd triton_cli && \
    pip install .

RUN triton remove -m all && triton import -m llama-2-7b  --backend tensorrtllm

CMD triton start

I have referenced this from my compose yaml:

  tritonserver:
    container_name: tritonserver
    build:
      context: ./django/Triton
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]
    shm_size: 1g

And the container exits with following (from container logs):

== Triton Inference Server ==
=============================

NVIDIA Release 24.02 (build 83572707)
Triton Server Version 2.43.0

Copyright (c) 2018-2023, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

NOTE: CUDA Forward Compatibility mode ENABLED.
  Using CUDA 12.3 driver version 545.23.08 with kernel driver version 535.161.07.
  See https://docs.nvidia.com/deploy/cuda-compatibility/ for details.

triton - ERROR - Failed to start server. Errors: ["Failed to start server in 'local' mode. 28:17 : 'max_batch_size: ${triton_max_batch_size}': Couldn't parse integer: $", "Failed to start server in 'docker' mode. Error while fetching server API version: ('Connection aborted.', FileNotFoundError(2, 'No such file or directory'))"]

Any thoughts?

Cheers.

Hi @IAINATDBI,

This first error looks like the root cause:

"Failed to start server in 'local' mode. 28:17 : 'max_batch_size: ${triton_max_batch_size}': Couldn't parse integer: $"

Generally this means that the model config template did not get filled in, like this line.

The template should've been filled in if triton import -m llama-2-7b --backend tensorrtllm succeeded. I would double check that step is succeeding correctly during docker build, or enter that container interactively and poke around to see the model repository inside looks correct and is where you expect it would be.

Hi @rmccorm4 - will try and revert back. Thank you.

Cheers

Sounds good, I'll close this for now, but feel free to re-open if you get more details and have more questions.