triton-inference-server / fastertransformer_backend

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Build backend inside the docker container, undefined symbol

A-ML-ER opened this issue · comments

Description

E0412 07:52:03.832683 14841 model_repository_manager.cc:1155] failed to load 'fastertransformer' version 1: Not found: unable to load shared library: /opt/tritonserver/backends/fastertransformer/libtriton_fastertransformer.so: undefined symbol: _ZN22ParallelGptTritonModelI6__halfE13createNcclIdsEjb



 fastertransformer | 1       | UNAVAILABLE: Not found: unable to load shared library: /opt/tritonserver/backe |
|                   |         | nds/fastertransformer/libtriton_fastertransformer.so: undefined symbol: _ZN22P |
|                   |         | arallelGptTritonModelI6__halfE13createNcclIdsEjb

Reproduced Steps

docker run -it --rm --gpus=all --shm-size=4G  -v $(pwd):/ft_workspace -p 8888:8888 triton_with_ft:22.03 bash

inside container
!cd /ft_workspace/fastertransformer_backend/build
!cmake  -D CMAKE_EXPORT_COMPILE_COMMANDS=1  -D CMAKE_BUILD_TYPE=Release -D CMAKE_INSTALL_PREFIX=/opt/tritonserver -D TRITON_COMMON_REPO_TAG="r${NVIDIA_TRITON_SERVER_VERSION}"  -D TRITON_CORE_REPO_TAG="r${NVIDIA_TRITON_SERVER_VERSION}"  -D TRITON_BACKEND_REPO_TAG="r${NVIDIA_TRITON_SERVER_VERSION}"   ..

!make -j 32 install
!CUDA_VISIBLE_DEVICES=0,1 /opt/tritonserver/bin/tritonserver  --model-repository=./triton-model-store/gptj/ &

/ft_workspace/fastertransformer_backend
the source git clone from https://github.com/triton-inference-server/fastertransformer_backend.git

Do you use the main branch? The undefined function is createNcclIds, which is a old function and is deprecated.

Can you try to search the function createNcclIds in your codes?