Errors
Dandiasdd opened this issue · comments
Dandi commented
i encountered this error while running my code with GPU
2021-03-30 16:11:36.205545: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
Killing subprocess 1237
Traceback (most recent call last):
File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/usr/local/lib/python3.7/dist-packages/torch/distributed/launch.py", line 340, in <module>
main()
File "/usr/local/lib/python3.7/dist-packages/torch/distributed/launch.py", line 326, in main
sigkill_handler(signal.SIGTERM, None) # not coming back
File "/usr/local/lib/python3.7/dist-packages/torch/distributed/launch.py", line 301, in sigkill_handler
raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd)
subprocess.CalledProcessError: Command '['/usr/bin/python3', '-u', 'drive/MyDrive/MPL/main.py', '--local_rank=0', '--name=hateful_memes_with_MPL', '--expand-labels', '--amp']' died with <Signals.SIGKILL: 9>.```
do you know the cause of this error ? thanks
Jd Kim commented
Please check the CUDA version.