import fbgemm_gpu get a undefined symbol error: undefined symbol: _ZN3c104impl8GPUTrace13gpuTraceStateE

Question

import fbgemm_gpu get a undefined symbol error: undefined symbol: _ZN3c104impl8GPUTrace13gpuTraceStateE

sea-of-freedom opened this issue 10 months ago · comments

just do one thing everyday commented 10 months ago

OS: x86_64/Intel(R) Xeon(R) Gold 6130T CPU
CUDA: NVIDIA-SMI 465.19.01 Driver Version: 465.19.01 CUDA Version: 11.3
torch :1.12.1+cu113
python:3.9
GPU:NVIDIA Tesla P4
torchrec & fbgemm-gpu:0.3.2(pip install)

when i import fbgemm_gpu, I get a error:
xxxxx/lib/python3.9/site-packages/fbgemm_gpu/fbgemm_gpu_py.so: undefined symbol: _ZN3c104impl8GPUTrace13gpuTraceStateE
thanks for your help

Supadchaya · Answer 1 · Thu Aug 10 2023 01:23:45 GMT+0800 (China Standard Time)

Hi @666easyfuture, it seems that the library could be compiled with an incompatible tool version or could be out-of-date. You may try adding the path to python: sys.path.insert(0,"xxxxx/lib/python3.9/site-packages/fbgemm_gpu") and see if that work. Otherwise, can you try to install the latest release of fbgemm-gpu in proper order? (See the instructions in order here: https://github.com/pytorch/FBGEMM/blob/main/fbgemm_gpu/docs/InstallationInstructions.md). Please let me know how it goes, thank you.

just do one thing everyday · Answer 2 · Thu Aug 10 2023 15:29:27 GMT+0800 (China Standard Time)

Hi @666easyfuture, it seems that the library could be compiled with an incompatible tool version or could be out-of-date. You may try adding the path to python: sys.path.insert(0,"xxxxx/lib/python3.9/site-packages/fbgemm_gpu") and see if that work. Otherwise, can you try to install the latest release of fbgemm-gpu in proper order? (See the instructions in order here: https://github.com/pytorch/FBGEMM/blob/main/fbgemm_gpu/docs/InstallationInstructions.md). Please let me know how it goes, thank you.

now, i use fbgemm-gpu==0.4.1, and add sys.path.insert(0,"xxxxx/lib/python3.9/site-packages/fbgemm_gpu") in my code.Then i get a another similar error:"xxxxxxx/lib/python3.9/site-packages/fbgemm_gpu/fbgemm_gpu_py.so: undefined symbol: _ZN2at4_ops5zeros4callEN3c108ArrayRefINS2_6SymIntEEENS2_8optionalINS2_10ScalarTypeEEENS6_INS2_6LayoutEEENS6_INS2_6DeviceEEENS6_IbEE"

Supadchaya · Answer 3 · Tue Aug 15 2023 05:13:28 GMT+0800 (China Standard Time)

Hi @666easyfuture, after looking into it, we do not actively support Pascal anymore. We maintain support from V100 onwards. Can you try building binary from source code? Please see https://github.com/pytorch/FBGEMM/blob/main/fbgemm_gpu/docs/BuildInstructions.md for instruction and set the compute capability/cuda_arch_list to match what you have. Thank you.

Please feel free to reopen the case if there are issues.