[torch_glow] undefined symbol when importing torch_glow

Question

[torch_glow] undefined symbol when importing torch_glow

balaram-cadence opened this issue 3 years ago · comments

I built pytorch from source and also followed directions in https://github.com/pytorch/glow/blob/master/docs/pytorch.md to make sure torch_glow can be built. When I run tests however, I get undefined symbol error when importing torch_glow.

ImportError while loading conftest 'torch_glow/tests/conftest.py'.
tests/conftest.py:4: in 
    import torch_glow
torch_glow/__init__.py:1: in 
    from ._torch_glow import *
E   ImportError: torch_glow/torch_glow/_torch_glow.cpython-36m-x86_64-linux-gnu.so: undefined symbol: _ZNK3c104Type14isSubtypeOfExtERKNSt3__110shared_ptrIS0_EEPNS1_13basic_ostreamIcNS1_11char_traitsIcEEEE

I tried with pip installed pytorch, and conda installed pytorch and also built pytorch from source. I also modified torch_glow/src/CMakeLists.txt to link libtorch_python for PyTorchModelLoader, yet see this failure. Any ideas how to fix this error?

Update
The symbol is defined in libtorch_cpu.so, but linker seems to be picking from libtorch_python.so:

llvm-nm -C -o my_path/miniconda/lib/python3.8/site-packages/torch/lib/*.so|grep "c10::Type::isSubtypeOfExt"
my_path/miniconda/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so: 0000000000c1ad30 T c10::Type::isSubtypeOfExt(std::shared_ptr const&, std::ostream*) const
my_path/miniconda/lib/python3.8/site-packages/torch/lib/libtorch_python.so:                  U c10::Type::isSubtypeOfExt(std::shared_ptr const&, std::ostream*) const

environment:

Name	Version	Build	Channel
pytorch	1.10.0.dev20210611	py3.8_cpu_0,[cpuonly]	pytorch-nightly
torchaudio	0.10.0.dev20210611	py38	pytorch-nightly
torchvision	0.11.0.dev20210611	py38_cpu,[cpuonly]	pytorch-nightly
glow		`24636b7`
clang	10.0.1	7512b932fdef0ab951620d6807b47417f6ac7cd2

CC: @jackm321

balaram-cadence · Answer 1 · Tue Jun 15 2021 03:19:57 GMT+0800 (China Standard Time)

The problem was due to mixing/matching -stdlib=libc++ (LLVM C++ runtime) and -stdlib=libstdc++ (GNU C++ runtime). The LLVM C++ runtime (libc++) has an __1 decoration symbol, but the GNU C++ runtime libstdc++ lacks the __1 symbol in its name. It causes linker problems for symbols that appears to have the same name. Building pytorch using clang fixed this issue for me.

balaram-cadence · Answer 2 · Tue Jun 29 2021 07:19:06 GMT+0800 (China Standard Time)

Our compiler is currently branched off of SHA 39c3f83 of glow and we use Clang10 toolchain and LLVM C++ runtime(stdlib=libc++) to build glow. We are facing compatibility issues[1] if we build from tip of trunk of pytorch and we cannot use the prebuilt pytorch builds because of the above problem.

--[1]--
>       return torch._C._jit_to_backend("glow", model, method_compile_spec)
E       RuntimeError: The following operation failed in the TorchScript interpreter.
E       Traceback of TorchScript (most recent call last):
E         File "<string>", line 10, in __setstate__
E                           self.__create_backend_debug_info()
E                       if self.__backend.is_available() :
E                           self.__handles = self.__backend.compile(self.__processed_module, self.__method_compile_spec)
E                                            ~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
E                       else:
E                           raise Exception("Backend is not available.")
E       RuntimeError: required keyword attribute 'name' has the wrong type

What version of pytorch should we use to build torch_glow?
cc: @zrphercule

Rui Zhu · Answer 3 · Tue Jun 29 2021 07:58:31 GMT+0800 (China Standard Time)

Thanks for bringing this issue here. I would like to know, if I understand correctly, that you are saying the current pytorch master is conflict with glow/torch_glow 39c3f83, which caused the error shown above?

balaram-cadence · Answer 4 · Tue Jun 29 2021 10:11:02 GMT+0800 (China Standard Time)

Thanks for taking a look. Yes that is correct.

Rui Zhu · Answer 5 · Wed Jun 30 2021 02:29:25 GMT+0800 (China Standard Time)

Thanks for taking a look. Yes that is correct.
Have you tried to update your torch_glow version to master and see if the problem still exist?

balaram-cadence · Answer 6 · Thu Jul 01 2021 01:10:49 GMT+0800 (China Standard Time)

Thanks for taking a look. Yes that is correct.
Have you tried to update your torch_glow version to master and see if the problem still exist?

On torch_glow version of master I see a different problem when importing torch_glow

>>> import torch
>>> import torch_glow
Traceback (most recent call last):
  File "", line 1, in 
  File "~/w/tug/torch_glow/torch_glow/__init__.py", line 1, in 
    from ._torch_glow import *
ImportError: ~/w/tug/torch_glow/torch_glow/_torch_glow.cpython-36m-x86_64-linux-gnu.so: undefined symbol: _ZN4glow5flags43DAGOptimizerParallelizationTaggingAlgorithmE
>>>

Rui Zhu · Answer 7 · Thu Jul 01 2021 07:48:09 GMT+0800 (China Standard Time)

Thanks for taking a look. Yes that is correct.
Have you tried to update your torch_glow version to master and see if the problem still exist?

On torch_glow version of master I see a different problem when importing torch_glow

import torch
import torch_glow
Traceback (most recent call last):
File "", line 1, in
File "~/w/tug/torch_glow/torch_glow/init.py", line 1, in
from ._torch_glow import *
ImportError: ~/w/tug/torch_glow/torch_glow/_torch_glow.cpython-36m-x86_64-linux-gnu.so: undefined symbol: _ZN4glow5flags43DAGOptimizerParallelizationTaggingAlgorithmE

This is werid, since the symbol it claims to be missing is here:
https://github.com/pytorch/glow/blob/master/include/glow/Flags/Flags.h#L86
and it is added for a few months already without causing problem.
I wonder when updating torch_glow did you update glow as well?

balaram-cadence · Answer 8 · Wed Jul 07 2021 04:42:42 GMT+0800 (China Standard Time)

Sorry for late reply, I was on vacation and looking at this now. The above problem was a setup issue on my end, after fixing the setup issue on my machine I still see the error[1] on master for the test tests/functionality/to_glow_write_to_onnx_test.py

--[1]--
>       return torch._C._jit_to_backend("glow", model, method_compile_spec)
E       RuntimeError: The following operation failed in the TorchScript interpreter.
E       Traceback of TorchScript (most recent call last):
E         File "", line 10, in __setstate__
E                           self.__create_backend_debug_info()
E                       if self.__backend.is_available() :
E                           self.__handles = self.__backend.compile(self.__processed_module, self.__method_compile_spec)
E                                            ~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
E                       else:
E                           raise Exception("Backend is not available.")
E       RuntimeError: required keyword attribute 'name' has the wrong type

Rui Zhu · Answer 9 · Wed Jul 07 2021 07:00:15 GMT+0800 (China Standard Time)

Sorry for late reply, I was on vacation and looking at this now. The above problem was a setup issue on my end, after fixing the setup issue on my machine I still see the error[1] on master for the test tests/functionality/to_glow_write_to_onnx_test.py

--[1]--
  return torch._C._jit_to_backend("glow", model, method_compile_spec)
E RuntimeError: The following operation failed in the TorchScript interpreter.
E Traceback of TorchScript (most recent call last):
E File "", line 10, in setstate
E self.__create_backend_debug_info()
E if self.__backend.is_available() :
E self.__handles = self.__backend.compile(self.__processed_module, self.__method_compile_spec)
E ~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
E else:
E raise Exception("Backend is not available.")
E RuntimeError: required keyword attribute 'name' has the wrong type

Hmmm so after your rebase on master, the issue came back to original one?

balaram-cadence · Answer 10 · Wed Jul 07 2021 07:24:53 GMT+0800 (China Standard Time)

yes, it came back to original one. Can't we use PyTorchModelLoader directly like OnnxModelLoader/Caffe2ModelLoader instead of dependency with pytorch?

Rui Zhu · Answer 11 · Wed Jul 07 2021 07:31:53 GMT+0800 (China Standard Time)

Unfortunately we can't, as PyTorchModelLoader is highly depend on pytorch

balaram-cadence · Answer 12 · Fri Jul 09 2021 07:57:54 GMT+0800 (China Standard Time)

I switched from Clang/libc++ to GCC/libstdc++ and built glow and pytorch using same toolchain and do not see the original error now. I don't know why Clang/libc++ toolchain doesn't work. Any thoughts?

Rui Zhu · Answer 13 · Fri Jul 09 2021 08:01:18 GMT+0800 (China Standard Time)

I switched from Clang/libc++ to GCC/libstdc++ and built glow and pytorch using same toolchain and do not see the original error now. I don't know why Clang/libc++ toolchain doesn't work. Any thoughts?

That's good to know! Maybe it is because of the version of Clang you are using? I remember for torch_glow there is some specific version requirement for llvm in our CI build: https://github.com/pytorch/glow/blob/master/.circleci/build.sh#L65