Build backend inside the docker container, undefined symbol
A-ML-ER opened this issue · comments
A-ML-ER commented
Description
E0412 07:52:03.832683 14841 model_repository_manager.cc:1155] failed to load 'fastertransformer' version 1: Not found: unable to load shared library: /opt/tritonserver/backends/fastertransformer/libtriton_fastertransformer.so: undefined symbol: _ZN22ParallelGptTritonModelI6__halfE13createNcclIdsEjb
fastertransformer | 1 | UNAVAILABLE: Not found: unable to load shared library: /opt/tritonserver/backe |
| | | nds/fastertransformer/libtriton_fastertransformer.so: undefined symbol: _ZN22P |
| | | arallelGptTritonModelI6__halfE13createNcclIdsEjb
Reproduced Steps
docker run -it --rm --gpus=all --shm-size=4G -v $(pwd):/ft_workspace -p 8888:8888 triton_with_ft:22.03 bash
inside container
!cd /ft_workspace/fastertransformer_backend/build
!cmake -D CMAKE_EXPORT_COMPILE_COMMANDS=1 -D CMAKE_BUILD_TYPE=Release -D CMAKE_INSTALL_PREFIX=/opt/tritonserver -D TRITON_COMMON_REPO_TAG="r${NVIDIA_TRITON_SERVER_VERSION}" -D TRITON_CORE_REPO_TAG="r${NVIDIA_TRITON_SERVER_VERSION}" -D TRITON_BACKEND_REPO_TAG="r${NVIDIA_TRITON_SERVER_VERSION}" ..
!make -j 32 install
!CUDA_VISIBLE_DEVICES=0,1 /opt/tritonserver/bin/tritonserver --model-repository=./triton-model-store/gptj/ &
A-ML-ER commented
/ft_workspace/fastertransformer_backend
the source git clone from https://github.com/triton-inference-server/fastertransformer_backend.git
byshiue_NV commented
Do you use the main branch? The undefined function is createNcclIds
, which is a old function and is deprecated.
A-ML-ER commented
https://github.com/triton-inference-server/fastertransformer_backend.git
main branch
the latest one
byshiue_NV commented
Can you try to search the function createNcclIds
in your codes?