Segmentation fault when launching demo_narrator [was: Keys remapping seems not to work]

Question

Segmentation fault when launching demo_narrator [was: Keys remapping seems not to work]

amessina71 opened this issue 2 years ago · comments

Hi all, thank you for this great piece of work.
I'm trying to make it run on a dual Ampére system, CUDA 11.6.
Just cloned the repo and followed the install instructions. No problems or errors during installation.
When I launch the demo_narrator.py script I get the following error, as if at first the key remapping procedure got wrong for some reason and then I get a Segmentation fault.
thank you in advance for your help
Alberto

python demo_narrator.py --cuda --video-path some.mp4 /usr/local/lib/python3.9/site-packages/torchvision/transforms/_functional_video.py:5: UserWarning: The _functional_video module is deprecated. Please use the functional module instead. warnings.warn( /usr/local/lib/python3.9/site-packages/torchvision/transforms/_transforms_video.py:25: UserWarning: The _transforms_video module is deprecated. Please use the transforms module instead. warnings.warn( downloading model to modelzoo/vclm_openai_timesformer_large_336px_gpt2_xl.pt_ego4d.jobid_246897.ep_0003.md5sum_443263.pth ######USING ATTENTION STYLE: frozen-in-time 100%|███████████████████████████████████████| 891M/891M [01:48<00:00, 8.58MiB/s] => Loading CLIP (ViT-L/14@336px) weights _IncompatibleKeys(missing_keys=['temporal_embed', 'blocks.0.timeattn.qkv.weight', 'blocks.0.timeattn.qkv.bias', 'blocks.0.timeattn.proj.weight', 'blocks.0.timeattn.proj.bias', 'blocks.0.norm3.weight', 'blocks.0.norm3.bias', 'blocks.1.timeattn.qkv.weight', 'blocks.1.timeattn.qkv.bias', 'blocks.1.timeattn.proj.weight', 'blocks.1.timeattn.proj.bias', 'blocks.1.norm3.weight', 'blocks.1.norm3.bias', 'blocks.2.timeattn.qkv.weight', 'blocks.2.timeattn.qkv.bias', 'blocks.2.timeattn.proj.weight', 'blocks.2.timeattn.proj.bias', 'blocks.2.norm3.weight', 'blocks.2.norm3.bias', 'blocks.3.timeattn.qkv.weight', 'blocks.3.timeattn.qkv.bias', 'blocks.3.timeattn.proj.weight', 'blocks.3.timeattn.proj.bias', 'blocks.3.norm3.weight', 'blocks.3.norm3.bias', 'blocks.4.timeattn.qkv.weight', 'blocks.4.timeattn.qkv.bias', 'blocks.4.timeattn.proj.weight', 'blocks.4.timeattn.proj.bias', 'blocks.4.norm3.weight', 'blocks.4.norm3.bias', 'blocks.5.timeattn.qkv.weight', 'blocks.5.timeattn.qkv.bias', 'blocks.5.timeattn.proj.weight', 'blocks.5.timeattn.proj.bias', 'blocks.5.norm3.weight', 'blocks.5.norm3.bias', 'blocks.6.timeattn.qkv.weight', 'blocks.6.timeattn.qkv.bias', 'blocks.6.timeattn.proj.weight', 'blocks.6.timeattn.proj.bias', 'blocks.6.norm3.weight', 'blocks.6.norm3.bias', 'blocks.7.timeattn.qkv.weight', 'blocks.7.timeattn.qkv.bias', 'blocks.7.timeattn.proj.weight', 'blocks.7.timeattn.proj.bias', 'blocks.7.norm3.weight', 'blocks.7.norm3.bias', 'blocks.8.timeattn.qkv.weight', 'blocks.8.timeattn.qkv.bias', 'blocks.8.timeattn.proj.weight', 'blocks.8.timeattn.proj.bias', 'blocks.8.norm3.weight', 'blocks.8.norm3.bias', 'blocks.9.timeattn.qkv.weight', 'blocks.9.timeattn.qkv.bias', 'blocks.9.timeattn.proj.weight', 'blocks.9.timeattn.proj.bias', 'blocks.9.norm3.weight', 'blocks.9.norm3.bias', 'blocks.10.timeattn.qkv.weight', 'blocks.10.timeattn.qkv.bias', 'blocks.10.timeattn.proj.weight', 'blocks.10.timeattn.proj.bias', 'blocks.10.norm3.weight', 'blocks.10.norm3.bias', 'blocks.11.timeattn.qkv.weight', 'blocks.11.timeattn.qkv.bias', 'blocks.11.timeattn.proj.weight', 'blocks.11.timeattn.proj.bias', 'blocks.11.norm3.weight', 'blocks.11.norm3.bias', 'blocks.12.timeattn.qkv.weight', 'blocks.12.timeattn.qkv.bias', 'blocks.12.timeattn.proj.weight', 'blocks.12.timeattn.proj.bias', 'blocks.12.norm3.weight', 'blocks.12.norm3.bias', 'blocks.13.timeattn.qkv.weight', 'blocks.13.timeattn.qkv.bias', 'blocks.13.timeattn.proj.weight', 'blocks.13.timeattn.proj.bias', 'blocks.13.norm3.weight', 'blocks.13.norm3.bias', 'blocks.14.timeattn.qkv.weight', 'blocks.14.timeattn.qkv.bias', 'blocks.14.timeattn.proj.weight', 'blocks.14.timeattn.proj.bias', 'blocks.14.norm3.weight', 'blocks.14.norm3.bias', 'blocks.15.timeattn.qkv.weight', 'blocks.15.timeattn.qkv.bias', 'blocks.15.timeattn.proj.weight', 'blocks.15.timeattn.proj.bias', 'blocks.15.norm3.weight', 'blocks.15.norm3.bias', 'blocks.16.timeattn.qkv.weight', 'blocks.16.timeattn.qkv.bias', 'blocks.16.timeattn.proj.weight', 'blocks.16.timeattn.proj.bias', 'blocks.16.norm3.weight', 'blocks.16.norm3.bias', 'blocks.17.timeattn.qkv.weight', 'blocks.17.timeattn.qkv.bias', 'blocks.17.timeattn.proj.weight', 'blocks.17.timeattn.proj.bias', 'blocks.17.norm3.weight', 'blocks.17.norm3.bias', 'blocks.18.timeattn.qkv.weight', 'blocks.18.timeattn.qkv.bias', 'blocks.18.timeattn.proj.weight', 'blocks.18.timeattn.proj.bias', 'blocks.18.norm3.weight', 'blocks.18.norm3.bias', 'blocks.19.timeattn.qkv.weight', 'blocks.19.timeattn.qkv.bias', 'blocks.19.timeattn.proj.weight', 'blocks.19.timeattn.proj.bias', 'blocks.19.norm3.weight', 'blocks.19.norm3.bias', 'blocks.20.timeattn.qkv.weight', 'blocks.20.timeattn.qkv.bias', 'blocks.20.timeattn.proj.weight', 'blocks.20.timeattn.proj.bias', 'blocks.20.norm3.weight', 'blocks.20.norm3.bias', 'blocks.21.timeattn.qkv.weight', 'blocks.21.timeattn.qkv.bias', 'blocks.21.timeattn.proj.weight', 'blocks.21.timeattn.proj.bias', 'blocks.21.norm3.weight', 'blocks.21.norm3.bias', 'blocks.22.timeattn.qkv.weight', 'blocks.22.timeattn.qkv.bias', 'blocks.22.timeattn.proj.weight', 'blocks.22.timeattn.proj.bias', 'blocks.22.norm3.weight', 'blocks.22.norm3.bias', 'blocks.23.timeattn.qkv.weight', 'blocks.23.timeattn.qkv.bias', 'blocks.23.timeattn.proj.weight', 'blocks.23.timeattn.proj.bias', 'blocks.23.norm3.weight', 'blocks.23.norm3.bias', 'head.weight', 'head.bias'], unexpected_keys=[]) Downloading config.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 689/689 [00:00<00:00, 289kB/s] Downloading pytorch_model.bin: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████| 5.99G/5.99G [12:13<00:00, 8.77MB/s] Segmentation fault (core dumped)

Yue Zhao · Answer 1 · Tue Dec 13 2022 01:05:50 GMT+0800 (China Standard Time)

Hi @amessina71 ,

The warning log with _IncompatibleKeys() is expected since we need to first load CLIP-pretrained weights for the spatial part of the TimeSformer (See Table 10 and Appendix F of our tech report) when constructing the model. Since we will load the entire weights later, you can safely ignore it.

The Segmentation fault is another error which I guess is because the video cannot be loaded by decord. Could you try the example video first and see if the demo generates something? I can add a more generic video loader and will let you know once it's done.

amessina71 · Answer 2 · Tue Dec 13 2022 17:05:43 GMT+0800 (China Standard Time)

Hi, thank you for your reply.
I've tested the file under the asset folder named assets/3c0dffd0-e38e-4643-bc48-d513943dc20b_012_014.mp4
same error
many thanks for your support
A.

Yue Zhao · Answer 3 · Wed Dec 14 2022 00:45:48 GMT+0800 (China Standard Time)

Then I don't think it's the issue from the video loading. Can you check your local environment by running e.g. https://github.com/pytorch/pytorch/blob/master/torch/utils/collect_env.py ?

amessina71 · Answer 4 · Wed Dec 14 2022 15:37:31 GMT+0800 (China Standard Time)

Hi, thank you for following this up. Here is the output of the collect_env script. I suspect there might be some sort of incompatibillity between the GPU architecture and the torch version ... I will try to upgrade torch and update the thread.
thank you for your support
A.

`root@59beef1d9912:/app# python3 collect_env.py
Collecting environment information...
/usr/local/lib/python3.9/site-packages/torch/cuda/init.py:143: UserWarning:
NVIDIA A100-PCIE-40GB with CUDA capability sm_80 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70.
If you want to use the NVIDIA A100-PCIE-40GB GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/

warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name))
PyTorch version: 1.10.1+cu102
Is debug build: False
CUDA used to build PyTorch: 10.2
ROCM used to build PyTorch: N/A
OS: Ubuntu 20.04.4 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.31
Python version: 3.9.10 (main, May 30 2022, 01:30:39) [GCC 9.4.0] (64-bit runtime)
Python platform: Linux-5.15.0-56-generic-x86_64-with-glibc2.31
Is CUDA available: True
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to:
GPU models and configuration:
GPU 0: NVIDIA A100-PCIE-40GB
GPU 1: NVIDIA A100-PCIE-40GB

Nvidia driver version: 510.108.03
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
Versions of relevant libraries:
[pip3] numpy==1.23.5
[pip3] pytorchvideo==0.1.5
[pip3] torch==1.10.1
[pip3] torchvision==0.11.2
[conda] Could not collect
root@59beef1d9912:/app#`

amessina71 · Answer 5 · Wed Dec 14 2022 16:37:07 GMT+0800 (China Standard Time)

Hello. I've upgraded pytorch to 1.11.0+cu113 and now I do not see that mismatch any more in the environment. However i still get the segmentation fault (even with the test video). here follows the new environment set. Down below, a screenshot of the gdb bt command after that SIGSEGV happened during a gdb session.
best regards
A.

`root@59beef1d9912:/app# python3 collect_env.py
Collecting environment information...
PyTorch version: 1.11.0+cu113
Is debug build: False
CUDA used to build PyTorch: 11.3
ROCM used to build PyTorch: N/A
OS: Ubuntu 20.04.4 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.31
Python version: 3.9.10 (main, May 30 2022, 01:30:39) [GCC 9.4.0] (64-bit runtime)
Python platform: Linux-5.15.0-56-generic-x86_64-with-glibc2.31
Is CUDA available: True
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to:
GPU models and configuration:
GPU 0: NVIDIA A100-PCIE-40GB
GPU 1: NVIDIA A100-PCIE-40GB

Nvidia driver version: 510.108.03
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
Versions of relevant libraries:
[pip3] numpy==1.23.5
[pip3] pytorchvideo==0.1.5
[pip3] torch==1.11.0+cu113
[pip3] torchaudio==0.11.0+cu113
[pip3] torchvision==0.12.0+cu113
[conda] Could not collect`

#0 0x00000000000682aa in ?? () #1 0x00007f1e96ad35ab in c10::detail::getNonDeterministicRandom(bool) () from /usr/local/lib/python3.9/site-packages/torch/lib/libc10.so #2 0x00007f1ee9af3483 in at::CUDAGeneratorImpl::seed() () from /usr/local/lib/python3.9/site-packages/torch/lib/libtorch_cuda_cpp.so #3 0x00007f1ee9af3b40 in std::call_once<at::cuda::detail::getDefaultCUDAGenerator(signed char)::{lambda()#1}>(std::once_flag&, at::cuda::detail::getDefaultCUDAGenerator(signed char)::{lambda()#1}&&)::{lambda()#2}::_FUN() () from /usr/local/lib/python3.9/site-packages/torch/lib/libtorch_cuda_cpp.so #4 0x00007f1fcbad44df in __pthread_once_slow (once_control=0x55e39ddb69b0, init_routine=0x7f1f4f246c20 <__once_proxy>) at pthread_once.c:116 #5 0x00007f1ee9af208f in at::cuda::detail::getDefaultCUDAGenerator(signed char) () from /usr/local/lib/python3.9/site-packages/torch/lib/libtorch_cuda_cpp.so #6 0x00007f1f4c3b2398 in THCPModule_initExtension(_object*, _object*) () from /usr/local/lib/python3.9/site-packages/torch/lib/libtorch_python.so #7 0x00007f1fcbe5cd84 in cfunction_vectorcall_NOARGS (func=<built-in method _cuda_init of module object at remote 0x7f1f4d3156d0>, args=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at Objects/methodobject.c:489 #8 0x00007f1fcbe9af26 in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, args=0x55e3d43e8028, callable=<built-in method _cuda_init of module object at remote 0x7f1f4d3156d0>, tstate=0x55e39c5ea080) at ./Include/cpython/abstract.h:118 #9 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x55e3d43e8028, callable=<optimized out>) at ./Include/cpython/abstract.h:127 #10 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x55e39c5ea080) at Python/ceval.c:5077 #11 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3489 #12 0x00007f1fcbe35e13 in _PyEval_EvalFrame (throwflag=0, f=Frame 0x55e3d43e7e80, for file /usr/local/lib/python3.9/site-packages/torch/cuda/__init__.py, line 216, in _lazy_init (), tstate=0x55e39c5ea080) at ./Include/internal/pycore_ceval.h:40 #13 function_code_fastcall (globals=<optimized out>, nargs=0, args=<optimized out>, co=<optimized out>, tstate=0x55e39c5ea080) at Objects/call.c:330 #14 _PyFunction_Vectorcall (func=<optimized out>, stack=0x0, nargsf=<optimized out>, kwnames=<optimized out>) at Objects/call.c:367

amessina71 · Answer 6 · Fri Dec 23 2022 15:22:06 GMT+0800 (China Standard Time)

Hi guys,
any news on this thread? It would be great if I could test the tool ...
Thank you in advance for your support

Gabriele Goletto · Answer 7 · Tue Feb 14 2023 19:17:07 GMT+0800 (China Standard Time)

Hi @amessina71,

I was facing the same issue, it seems like inverting the import of decord and torch (torch first and then decord) solved it.

Hope it helps you as well :-)

amessina71 · Answer 8 · Tue Feb 14 2023 23:14:23 GMT+0800 (China Standard Time)

Hi @amessina71,

I was facing the same issue, it seems like inverting the import of decord and torch (torch first and then decord) solved it.

Hope it helps you as well :-)

Yes it worked for me too! Thanks for the hint.
Alberto

Yue Zhao · Answer 9 · Sat Feb 18 2023 04:14:44 GMT+0800 (China Standard Time)

Hi @amessina71,

I was facing the same issue, it seems like inverting the import of decord and torch (torch first and then decord) solved it.

Hope it helps you as well :-)

Thank you @ezius07 for spotting this issue! I will add a patch shortly to fix this.

Best,
Yue