NCCL 'unhandled cuda error'
SeibertronSS opened this issue · comments
Branch: main
Triton Docker Version: 22.03-py3
GPU: v100
CUDA version: 11.2
Driver Version: 460.32.03
I use Triton+FT to deploy the GPT-NeoX 20B model. When I set the model output length greater than 180, the following error will occur
I0526 10:11:21.620680 101 libfastertransformer.cc:1231] streaming response is sent
I0526 10:11:21.620698 101 python.cc:616] model postprocessing, instance postprocessing_0, executing 1 requests
I0526 10:11:21.621395 101 infer_response.cc:166] add response output: output: OUTPUT, type: BYTES, shape: [1]
I0526 10:11:21.621417 101 pinned_memory_manager.cc:161] pinned memory allocation: size 322, addr 0x7eff9a000180
I0526 10:11:21.621423 101 ensemble_scheduler.cc:523] Internal response allocation: OUTPUT, size 322, addr 0x7eff9a000180, memory type 1, type id 0
I0526 10:11:21.621441 101 ensemble_scheduler.cc:538] Internal response release: size 322, addr 0x7eff9a000180
I0526 10:11:21.621456 101 infer_response.cc:140] add response output: output: output_log_probs, type: FP32, shape: [1,1,200]
I0526 10:11:21.621485 101 grpc_server.cc:2544] GRPC: unable to provide 'output_log_probs' in GPU, will use CPU
I0526 10:11:21.621499 101 grpc_server.cc:2555] GRPC: using buffer for 'output_log_probs', size: 800, addr: 0x7efef406c8a0
I0526 10:11:21.801547 101 infer_response.cc:140] add response output: output: cum_log_probs, type: FP32, shape: [1,1]
I0526 10:11:21.801605 101 grpc_server.cc:2544] GRPC: unable to provide 'cum_log_probs' in GPU, will use CPU
I0526 10:11:21.801622 101 grpc_server.cc:2555] GRPC: using buffer for 'cum_log_probs', size: 4, addr: 0x7efef406ccf0
I0526 10:11:21.801686 101 infer_response.cc:140] add response output: output: sequence_length, type: UINT32, shape: [1,1]
I0526 10:11:21.801704 101 grpc_server.cc:2544] GRPC: unable to provide 'sequence_length' in GPU, will use CPU
I0526 10:11:21.801710 101 grpc_server.cc:2555] GRPC: using buffer for 'sequence_length', size: 4, addr: 0x7efef406ce60
I0526 10:11:21.801738 101 infer_response.cc:140] add response output: output: OUTPUT_0, type: BYTES, shape: [1]
I0526 10:11:21.801780 101 grpc_server.cc:2544] GRPC: unable to provide 'OUTPUT_0' in CPU_PINNED, will use CPU
I0526 10:11:21.801793 101 grpc_server.cc:2555] GRPC: using buffer for 'OUTPUT_0', size: 322, addr: 0x7efef406d0c0
I0526 10:11:21.801812 101 pinned_memory_manager.cc:190] pinned memory deallocation: addr 0x7eff9a000180
I0526 10:11:21.801844 101 grpc_server.cc:4188] ModelStreamInferHandler::StreamInferComplete, context 3, 4 step ISSUED, callback index 137, flags 0
I0526 10:11:21.801873 101 grpc_server.cc:2637] GRPC free: size 800, addr 0x7efef406c8a0
I0526 10:11:21.801882 101 grpc_server.cc:2637] GRPC free: size 4, addr 0x7efef406ccf0
I0526 10:11:21.801894 101 grpc_server.cc:2637] GRPC free: size 4, addr 0x7efef406ce60
I0526 10:11:21.801899 101 grpc_server.cc:2637] GRPC free: size 322, addr 0x7efef406d0c0
I0526 10:11:21.801944 101 python.cc:1960] TRITONBACKEND_ModelInstanceExecute: model instance name postprocessing_0 released 1 requests
I0526 10:11:21.802584 101 grpc_server.cc:3825] Process for ModelStreamInferHandler, rpc_ok=1, context 3, 4 step WRITEREADY
I0526 10:11:21.802696 101 grpc_server.cc:3825] Process for ModelStreamInferHandler, rpc_ok=1, context 3, 4 step WRITTEN
Failed, NCCL error /workspace/build/fastertransformer_backend/build/_deps/repo-ft-src/src/fastertransformer/utils/nccl_utils.cc:64 'unhandled cuda error'
Failed, NCCL error /workspace/build/fastertransformer_backend/build/_deps/repo-ft-src/src/fastertransformer/utils/nccl_utils.cc:64 'unhandled cuda error'
Failed, NCCL error /workspace/build/fastertransformer_backend/build/_deps/repo-ft-src/src/fastertransformer/utils/nccl_utils.cc:64 'unhandled cuda error'
corrupted double-linked list
Signal (11) received.
I0526 10:11:21.881235 101 libfastertransformer.cc:1266] Stop to forward
Signal (6) received.
Following is the output of the dmesg command after the error occurs
dmesg.txt
Graphics card remains at this high utilization after the error
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03 Driver Version: 460.32.03 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla V100-PCIE... Off | 00000000:00:0B.0 Off | 0 |
| N/A 37C P0 44W / 250W | 12085MiB / 16160MiB | 100% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 Tesla V100-PCIE... Off | 00000000:00:0C.0 Off | 0 |
| N/A 37C P0 44W / 250W | 12059MiB / 16160MiB | 100% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 2 Tesla V100-PCIE... Off | 00000000:00:0D.0 Off | 0 |
| N/A 37C P0 43W / 250W | 12059MiB / 16160MiB | 100% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 3 Tesla V100-PCIE... Off | 00000000:00:0E.0 Off | 0 |
| N/A 39C P0 45W / 250W | 12059MiB / 16160MiB | 100% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 282818 C tritonserver 12053MiB |
| 1 N/A N/A 282818 C tritonserver 12053MiB |
| 2 N/A N/A 282818 C tritonserver 12053MiB |
| 3 N/A N/A 282818 C tritonserver 12053MiB |
+-----------------------------------------------------------------------------+