google-deepmind / deepmind-research

This repository contains implementations and illustrative code to accompany DeepMind publications

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

BYOL setup error: undefined symbol: _ZNK6google8protobuf7Message11GetTypeNameEv

kuihao opened this issue · comments

Help! I followed the setup steps of byol README exactly, but as soon as I run python -m byol.main_loop \ --experiment_mode='pretrain' \ --worker_mode='train' \ --checkpoint_root='/tmp/byol_checkpoints' \ --pretrain_epochs=40 or python -m byol.main_loop_test I got the following error:

Traceback (most recent call last):
  File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/content/byol/main_loop_test.py", line 21, in <module>
    from byol import byol_experiment
  File "/content/byol/byol_experiment.py", line 25, in <module>
    from acme.jax import utils as acme_utils
  File "/usr/local/lib/python3.7/dist-packages/acme/__init__.py", line 35, in <module>
    from acme.environment_loop import EnvironmentLoop
  File "/usr/local/lib/python3.7/dist-packages/acme/environment_loop.py", line 26, in <module>
    from acme.utils import signals
  File "/usr/local/lib/python3.7/dist-packages/acme/utils/signals.py", line 22, in <module>
    import launchpad
  File "/usr/local/lib/python3.7/dist-packages/launchpad/__init__.py", line 36, in <module>
    from launchpad.nodes.courier.node import CourierHandle
  File "/usr/local/lib/python3.7/dist-packages/launchpad/nodes/courier/node.py", line 21, in <module>
    import courier
  File "/usr/local/lib/python3.7/dist-packages/courier/__init__.py", line 26, in <module>
    from courier.python.client import Client  # pytype: disable=import-error
  File "/usr/local/lib/python3.7/dist-packages/courier/python/client.py", line 30, in <module>
    from courier.python import py_client
ImportError: /usr/local/lib/python3.7/dist-packages/courier/python/libserialization_cc_proto.so: undefined symbol: _ZNK6google8protobuf7Message11GetTypeNameEv

I have tried many different python version and OS:

  • Python 3.9, 3.8, 3.7, 3.6
  • Linux Ubuntu 20.04 LTS, Ubuntu 18.04.6 LTS (Google Colab), windows10, Jetson Nano Image jp461 (linux OS nvidia cuda embedded)

There is no any error with the installation process, but as long as the execution of that step (python -m byol.main_loop --...) is to report an error (...undefined symbol: _ZNK6google8protobuf7Message11GetTypeNameEv).

I searched the internet and it seems to be related to protobuf. I tried to update the version or reinstall another version, but because of the dependency of the package, I ended up installing only protobuf version == 3.19.6 (default), protobuf version == 3.19.5

I also tried to compile jax, jaxlib, protobuf by myself (download the source code from their github), but it still didn't work.

I have also tried the following GPU hardware:

  • Nvidia RTX 3090
  • Nvidia GTX 1080Ti
  • Nvidia GTX 1070
  • Nvidia Jetson Nano 4GB

But all of them are not working. Can anyone please provide me with the software and hardware environment settings that I can be sure will work?

Were you able to find a solution to this?
I am facing the exact same issue.

Is there any update on this?

I was facing a similar issue, with Launchpad.

...
  File "/home/callum/miniconda3/envs/protobuf_issue_test/lib/python3.9/site-packages/launchpad/nodes/courier/node.py", line 21, in <module>
    import courier
  File "/home/callum/miniconda3/envs/protobuf_issue_test/lib/python3.9/site-packages/courier/__init__.py", line 26, in <module>
    from courier.python.client import Client  # pytype: disable=import-error
  File "/home/callum/miniconda3/envs/protobuf_issue_test/lib/python3.9/site-packages/courier/python/client.py", line 30, in <module>
    from courier.python import py_client
ImportError: /home/callum/miniconda3/envs/protobuf_issue_test/lib/python3.9/site-packages/courier/python/libserialization_cc_proto.so: undefined symbol: _ZNK6google8protobuf7Message11GetTypeNameEv

As you can see, there is the same missing symbol: _ZNK6google8protobuf7Message11GetTypeNameEv.

I used unix's library dependency checker on the cited shared object file: (ldd /home/callum/miniconda3/envs/protobuf_issue_test/lib/python3.9/site-packages/courier/python/libserialization_cc_proto.so), as suggested in NVIDIA-AI-IOT/torch2trt#53, which yields the following:

...
        libtensorflow_framework.so.2 => not found
...

I first checked that I had tensorflow installed:

(protobuf_issue_test) ➜  callum-tilbury ✗ pip show tensorflow
Name: tensorflow
Version: 2.11.0
Summary: TensorFlow is an open source machine learning framework for everyone.
...

and did similar things for associated TF packages, tensorflow-datasets, etc. But they were all there already.

I then progressively downgraded tensorflow, eventually to v2.8, pip install tensorflow~=2.8.0, and it fixed the issue. Interestingly, ldd still shows libtensorflow_framework.so.2 as not found.

This approach isn't exactly a solution, and feels more like a hack. But perhaps it's a step in understanding why things are breaking. Hopefully it helps you :)