PyTorch "Undefined symbol" error when importing SAM ONNX models to cluster

Question

PyTorch "Undefined symbol" error when importing SAM ONNX models to cluster

marias65 opened this issue 4 months ago · comments

Currently trying to follow the segment anything notebook to run sentinel2_segmentation.ipynb but when trying to import SAM's ONNX models to the cluster with ! python ../../scripts/export_sam_models.py --models vit_b, I run into an error that says "ImportError: /home/msbksan/micromamba/envs/segment_anything_cpu/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so: undefined symbol: iJIT_NotifyEvent"

Rafael Soares Padilha · Answer 1 · Mon Apr 15 2024 21:37:40 GMT+0800 (China Standard Time)

Hi, @marias65. I couldn't reproduce your error on my machine but found a few similar issues here and here that the cause might be installing pytorch via conda and a possible solution would be pointing to the CPU wheel during installation.

Quick question: are you able to import PyTorch in the segment_anything_cpy environment?

$ python -c "import torch; print(torch.__version__)"

I was able to set up a new environment with the latest version of PyTorch and run the script to export the model to ONNX files. Could I ask you to try on your end as well?

Please change the env_cpy.yaml, commenting the pip lines as below:

name: new_segment_anything_cpu
channels:
  - pytorch
  - nvidia
  - conda-forge
  - defaults
dependencies:
  - python==3.8.*
  - geopandas~=0.11.1
  - ipython~=8.5.0
  - ipywidgets~=8.0.2
  - jupyter~=1.0.0
  - matplotlib~=3.6.0
  - numpy~=1.23.3
  # - pytorch=2.0.0=py3.8_cpu_0
  # - torchvision=0.15.0=py38_cpu
  # - torchaudio=2.0.0=py38_cpu
  - pip~=22.2.0
  - pandas~=1.5.0
  - rasterio~=1.3.2
  - shapely~=1.8.4
  - tqdm~=4.64.1
  - scikit-image~=0.20.0
  # - pip:
  #     - git+https://github.com/facebookresearch/segment-anything.git
  #     - ../../src/vibe_core
  #     - cartopy~=0.21.0
  #     - xarray~=2022.10.0
  #     - ipympl~=0.9.3
  #     - onnx~=1.14.0
  #     - onnxruntime~=1.15.0

Once the env is created, please activate it and install the pip packages:

$ micromamba activate new_segment_anything_cpu
$ pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
$ pip install git+https://github.com/facebookresearch/segment-anything.git 
$ pip install ../../src/vibe_core 
$ pip install cartopy~=0.21.0 xarray~=2022.10.0 ipympl~=0.9.3 onnx~=1.14.0 onnxruntime~=1.15.0

Make sure the path to src/vibe_core is correct on the $ pip install ../../src/vibe_core command.

Please, let me know if you are able to run the exportation script in this new environment.

Maria Sofia · Answer 2 · Wed Apr 17 2024 03:34:53 GMT+0800 (China Standard Time)

Thank you for your response! I was not able to run $ python -c "import torch; print(torch.__version__)" as it gave me the same iJIT_NotifyEvent error while in the segment_anything_cpu environment.

I was able to create the new_segment_anything_cpu environment and install all the pip packages you listed but when I attempted to run $ python -c "import torch; print(torch.__version__)" or ! python ../../scripts/export_sam_models.py --models vit_b I still came across the same iJIT_NotifyEvent error.

Rafael Soares Padilha · Answer 3 · Fri Apr 19 2024 22:25:28 GMT+0800 (China Standard Time)

Hi, @marias65. I was able to replicate your issue.

Installing the pytorch 2.1.0 with the appropriate wheel within the segment anything environment solved the problem for me.

In summary, what I did was:

Create the segment_anything_cpu environment with the yaml that is currently available in the repo.
Run pip install torch~=2.1.0 --index-url https://download.pytorch.org/whl/cpu

After that, I was able to import torch:

$ python -c "import torch; print(torch.__version__)"
2.1.2+cpu

Please, could you let me know if this works for you?

I will fix the environment yaml files in the next release.

Rafael Soares Padilha · Answer 4 · Fri Apr 19 2024 23:02:32 GMT+0800 (China Standard Time)

Another possibility that worked for me (and won't change the pytorch version) was creating the environment with the following yaml:

name: segment_anything_cpu
channels:
  - pytorch
  - nvidia
  - conda-forge
  - defaults
dependencies:
  - python==3.8.*
  - geopandas~=0.11.1
  - ipython~=8.5.0
  - ipywidgets~=8.0.2
  - jupyter~=1.0.0
  - matplotlib~=3.6.0
  - numpy~=1.23.3
  - pip~=22.2.0
  - pandas~=1.5.0
  - rasterio~=1.3.2
  - shapely~=1.8.4
  - tqdm~=4.64.1
  - scikit-image~=0.20.0
  - pip:
      - --extra-index-url https://download.pytorch.org/whl/cpu
      - torch~=2.0.0
      - torchvision~=0.15.0
      - torchaudio~=2.0.0
      - git+https://github.com/facebookresearch/segment-anything.git
      - ../../src/vibe_core
      - cartopy~=0.21.0
      - xarray~=2022.10.0
      - ipympl~=0.9.3
      - onnx~=1.14.0
      - onnxruntime~=1.15.0

by running:

$ micromamba env create -f notebooks/segment_anything/env_cpu.yaml

With the environment activated:

$ python -c "import torch; print(torch.__version__)"
2.0.1+cpu

Maria Sofia · Answer 5 · Sat Apr 20 2024 04:43:24 GMT+0800 (China Standard Time)

Thank you! I rebuild farmvibes-ai and followed your latest solution and that seems to have helped!

Right now, I receive this message but looking into it further suggests that it is due to limited memory on the machine I am currently using. Otherwise, I would say that it worked, thank you

Rafael Soares Padilha · Answer 6 · Tue Apr 23 2024 20:09:07 GMT+0800 (China Standard Time)

I'm glad that error is fixed.

For this new one, the script doesn't require that much memory, especially with the vit_b model. What are your specs (memory and disk space)?

The script also logs a few messages (e.g., when it is able to load the encoder/decoder model and when it starts converting them), but these didn't show up, which I find it weird.

Are you able to import onnxruntime and onnx?

import onnx
import onnxruntime

Rafael Soares Padilha · Answer 7 · Sat May 18 2024 00:34:15 GMT+0800 (China Standard Time)

Closing this issue for now. @marias65, let me know if you are still facing this error.