Segmentation Fault on running model

Question

Segmentation Fault on running model

LanHikari22 opened this issue 7 months ago · comments

Hi,

So I have two issues. I will walk through my process. Initially, I tried to install the project requirements using

~/miniconda3/bin/python3 -m pip install git+https://github.com/suno-ai/bark.git

And installing the required cudnn:

conda install cudnn=8.9.2

but got the following error attempting to import BarkModel from transformers (source code included at the end of the post):

(base) ~ ➜ LD_LIBRARY_PATH=~/miniconda3/lib ~/miniconda3/bin/python3 ~/src/exp/bark_gpu.py                
[+] Importing Transformers
2023-12-03 14:02:30.308417: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2023-12-03 14:02:30.414817: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-12-03 14:02:31.859108: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
Traceback (most recent call last):
  File "/home/lan/miniconda3/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1353, in _get_module
    return importlib.import_module("." + module_name, self.__name__)
  File "/home/lan/miniconda3/lib/python3.10/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/home/lan/miniconda3/lib/python3.10/site-packages/transformers/models/bark/modeling_bark.py", line 55, in <module>
    from flash_attn import flash_attn_func, flash_attn_varlen_func
  File "/home/lan/.local/lib/python3.10/site-packages/flash_attn/__init__.py", line 3, in <module>
    from flash_attn.flash_attn_interface import (
  File "/home/lan/.local/lib/python3.10/site-packages/flash_attn/flash_attn_interface.py", line 8, in <module>
    import flash_attn_2_cuda as flash_attn_cuda
ImportError: /home/lan/.local/lib/python3.10/site-packages/flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN3c104cuda20CUDACachingAllocator9allocatorE

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/lan/src/exp/bark_gpu.py", line 55, in <module>
    from transformers import AutoProcessor, BarkModel
  File "<frozen importlib._bootstrap>", line 1075, in _handle_fromlist
  File "/home/lan/miniconda3/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1344, in __getattr__
    value = getattr(module, name)
  File "/home/lan/miniconda3/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1343, in __getattr__
    module = self._get_module(self._class_to_module[name])
  File "/home/lan/miniconda3/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1355, in _get_module
    raise RuntimeError(
RuntimeError: Failed to import transformers.models.bark.modeling_bark because of the following error (look up to see its traceback):
/home/lan/.local/lib/python3.10/site-packages/flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN3c104cuda20CUDACachingAllocator9allocatorE

This project installs pytorch 1.13.1. I was able to bypass this issue by installing 2.1.1

~/miniconda3/bin/python3 -m pip install torch==2.1.1

Now I am able to run the model and generate audio on CPU. However, trying to use my GPU with CUDA results in a segmentation fault when running the model:

(base) ~ ➜ LD_LIBRARY_PATH=~/miniconda3/lib ~/miniconda3/bin/python3 ~/src/exp/bark_gpu.py    
[+] Importing Transformers
2023-12-03 14:16:58.823061: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2023-12-03 14:16:58.876651: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-12-03 14:16:59.624913: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
Took 00:00:02.698
[+] Loading Processor
Took 00:00:00.413
[+] Loading Model
/home/lan/miniconda3/lib/python3.10/site-packages/torch/nn/utils/weight_norm.py:30: UserWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.
  warnings.warn("torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.")
Took 00:00:12.618
[+] Processing Input
Took 00:00:00.179
[+] Running Model
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:10000 for open-end generation.
[1]    1156277 segmentation fault (core dumped)  LD_LIBRARY_PATH=~/miniconda3/lib ~/miniconda3/bin/python3

You can find the source code at
bark_gpu.py.txt

Please let me know if I need to provide any more information.

Thanks,
Lan