pytorch / glow

Compiler for Neural Network hardware accelerators

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[torch_glow] undefined symbol when importing torch_glow

balaram-cadence opened this issue · comments

I built pytorch from source and also followed directions in https://github.com/pytorch/glow/blob/master/docs/pytorch.md to make sure torch_glow can be built. When I run tests however, I get undefined symbol error when importing torch_glow.

ImportError while loading conftest 'torch_glow/tests/conftest.py'.
tests/conftest.py:4: in 
    import torch_glow
torch_glow/__init__.py:1: in 
    from ._torch_glow import *
E   ImportError: torch_glow/torch_glow/_torch_glow.cpython-36m-x86_64-linux-gnu.so: undefined symbol: _ZNK3c104Type14isSubtypeOfExtERKNSt3__110shared_ptrIS0_EEPNS1_13basic_ostreamIcNS1_11char_traitsIcEEEE

I tried with pip installed pytorch, and conda installed pytorch and also built pytorch from source. I also modified torch_glow/src/CMakeLists.txt to link libtorch_python for PyTorchModelLoader, yet see this failure. Any ideas how to fix this error?

Update
The symbol is defined in libtorch_cpu.so, but linker seems to be picking from libtorch_python.so:

llvm-nm -C -o my_path/miniconda/lib/python3.8/site-packages/torch/lib/*.so|grep "c10::Type::isSubtypeOfExt"
my_path/miniconda/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so: 0000000000c1ad30 T c10::Type::isSubtypeOfExt(std::shared_ptr const&, std::ostream*) const
my_path/miniconda/lib/python3.8/site-packages/torch/lib/libtorch_python.so:                  U c10::Type::isSubtypeOfExt(std::shared_ptr const&, std::ostream*) const

environment:

Name Version Build Channel
pytorch 1.10.0.dev20210611 py3.8_cpu_0,[cpuonly] pytorch-nightly
torchaudio 0.10.0.dev20210611 py38 pytorch-nightly
torchvision 0.11.0.dev20210611 py38_cpu,[cpuonly] pytorch-nightly
glow 24636b7
clang 10.0.1 7512b932fdef0ab951620d6807b47417f6ac7cd2

CC: @jackm321

The problem was due to mixing/matching -stdlib=libc++ (LLVM C++ runtime) and -stdlib=libstdc++ (GNU C++ runtime). The LLVM C++ runtime (libc++) has an __1 decoration symbol, but the GNU C++ runtime libstdc++ lacks the __1 symbol in its name. It causes linker problems for symbols that appears to have the same name. Building pytorch using clang fixed this issue for me.

Our compiler is currently branched off of SHA 39c3f83 of glow and we use Clang10 toolchain and LLVM C++ runtime(stdlib=libc++) to build glow. We are facing compatibility issues[1] if we build from tip of trunk of pytorch and we cannot use the prebuilt pytorch builds because of the above problem.

--[1]--
>       return torch._C._jit_to_backend("glow", model, method_compile_spec)
E       RuntimeError: The following operation failed in the TorchScript interpreter.
E       Traceback of TorchScript (most recent call last):
E         File "<string>", line 10, in __setstate__
E                           self.__create_backend_debug_info()
E                       if self.__backend.is_available() :
E                           self.__handles = self.__backend.compile(self.__processed_module, self.__method_compile_spec)
E                                            ~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
E                       else:
E                           raise Exception("Backend is not available.")
E       RuntimeError: required keyword attribute 'name' has the wrong type

What version of pytorch should we use to build torch_glow?
cc: @zrphercule

Thanks for bringing this issue here. I would like to know, if I understand correctly, that you are saying the current pytorch master is conflict with glow/torch_glow 39c3f83, which caused the error shown above?

Thanks for taking a look. Yes that is correct.

Thanks for taking a look. Yes that is correct.
Have you tried to update your torch_glow version to master and see if the problem still exist?

Thanks for taking a look. Yes that is correct.
Have you tried to update your torch_glow version to master and see if the problem still exist?

On torch_glow version of master I see a different problem when importing torch_glow

>>> import torch
>>> import torch_glow
Traceback (most recent call last):
  File "", line 1, in 
  File "~/w/tug/torch_glow/torch_glow/__init__.py", line 1, in 
    from ._torch_glow import *
ImportError: ~/w/tug/torch_glow/torch_glow/_torch_glow.cpython-36m-x86_64-linux-gnu.so: undefined symbol: _ZN4glow5flags43DAGOptimizerParallelizationTaggingAlgorithmE
>>>

Thanks for taking a look. Yes that is correct.
Have you tried to update your torch_glow version to master and see if the problem still exist?

On torch_glow version of master I see a different problem when importing torch_glow

import torch
import torch_glow
Traceback (most recent call last):
File "", line 1, in
File "~/w/tug/torch_glow/torch_glow/init.py", line 1, in
from ._torch_glow import *
ImportError: ~/w/tug/torch_glow/torch_glow/_torch_glow.cpython-36m-x86_64-linux-gnu.so: undefined symbol: _ZN4glow5flags43DAGOptimizerParallelizationTaggingAlgorithmE

This is werid, since the symbol it claims to be missing is here:
https://github.com/pytorch/glow/blob/master/include/glow/Flags/Flags.h#L86
and it is added for a few months already without causing problem.
I wonder when updating torch_glow did you update glow as well?

Sorry for late reply, I was on vacation and looking at this now. The above problem was a setup issue on my end, after fixing the setup issue on my machine I still see the error[1] on master for the test tests/functionality/to_glow_write_to_onnx_test.py

--[1]--
>       return torch._C._jit_to_backend("glow", model, method_compile_spec)
E       RuntimeError: The following operation failed in the TorchScript interpreter.
E       Traceback of TorchScript (most recent call last):
E         File "", line 10, in __setstate__
E                           self.__create_backend_debug_info()
E                       if self.__backend.is_available() :
E                           self.__handles = self.__backend.compile(self.__processed_module, self.__method_compile_spec)
E                                            ~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
E                       else:
E                           raise Exception("Backend is not available.")
E       RuntimeError: required keyword attribute 'name' has the wrong type

Sorry for late reply, I was on vacation and looking at this now. The above problem was a setup issue on my end, after fixing the setup issue on my machine I still see the error[1] on master for the test tests/functionality/to_glow_write_to_onnx_test.py

--[1]--

  return torch._C._jit_to_backend("glow", model, method_compile_spec)

E RuntimeError: The following operation failed in the TorchScript interpreter.
E Traceback of TorchScript (most recent call last):
E File "", line 10, in setstate
E self.__create_backend_debug_info()
E if self.__backend.is_available() :
E self.__handles = self.__backend.compile(self.__processed_module, self.__method_compile_spec)
E ~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
E else:
E raise Exception("Backend is not available.")
E RuntimeError: required keyword attribute 'name' has the wrong type

Hmmm so after your rebase on master, the issue came back to original one?

yes, it came back to original one. Can't we use PyTorchModelLoader directly like OnnxModelLoader/Caffe2ModelLoader instead of dependency with pytorch?

Unfortunately we can't, as PyTorchModelLoader is highly depend on pytorch

I switched from Clang/libc++ to GCC/libstdc++ and built glow and pytorch using same toolchain and do not see the original error now. I don't know why Clang/libc++ toolchain doesn't work. Any thoughts?

I switched from Clang/libc++ to GCC/libstdc++ and built glow and pytorch using same toolchain and do not see the original error now. I don't know why Clang/libc++ toolchain doesn't work. Any thoughts?

That's good to know! Maybe it is because of the version of Clang you are using? I remember for torch_glow there is some specific version requirement for llvm in our CI build: https://github.com/pytorch/glow/blob/master/.circleci/build.sh#L65