Maybe environmental problems

Question

Maybe environmental problems

Seperendity opened this issue a year ago · comments

Prerequisite

I have searched the existing and past issues but cannot get the expected help.
I have read the FAQ documentation but cannot get the expected help.
The bug has not been fixed in the latest version.

🐞 Describe the bug

python tools/train.py configs/yolov5/yolov5_s-v61_syncbn_fast_1xb4-300e_balloon.py

Traceback (most recent call last):
  File "tools/train.py", line 106, in <module>
    main()
  File "tools/train.py", line 56, in main
    register_all_modules(init_default_scope=False)
  File "/home/hpc/mmyolo/mmyolo/utils/setup_env.py", line 20, in register_all_modules
    import mmdet.visualization  # noqa: F401,F403
  File "/home/hpc/.local/lib/python3.8/site-packages/mmdet/visualization/__init__.py", line 2, in <module>
    from .local_visualizer import DetLocalVisualizer
  File "/home/hpc/.local/lib/python3.8/site-packages/mmdet/visualization/local_visualizer.py", line 12, in <module>
    from ..evaluation import INSTANCE_OFFSET
  File "/home/hpc/.local/lib/python3.8/site-packages/mmdet/evaluation/__init__.py", line 3, in <module>
    from .metrics import *  # noqa: F401,F403
  File "/home/hpc/.local/lib/python3.8/site-packages/mmdet/evaluation/metrics/__init__.py", line 3, in <module>
    from .coco_metric import CocoMetric
  File "/home/hpc/.local/lib/python3.8/site-packages/mmdet/evaluation/metrics/coco_metric.py", line 15, in <module>
    from mmdet.datasets.api_wrappers import COCO, COCOeval
  File "/home/hpc/.local/lib/python3.8/site-packages/mmdet/datasets/__init__.py", line 13, in <module>
    from .utils import get_loading_pipeline
  File "/home/hpc/.local/lib/python3.8/site-packages/mmdet/datasets/utils.py", line 5, in <module>
    from mmdet.datasets.transforms import LoadAnnotations, LoadPanopticAnnotations
  File "/home/hpc/.local/lib/python3.8/site-packages/mmdet/datasets/transforms/__init__.py", line 6, in <module>
    from .formatting import ImageToTensor, PackDetInputs, ToTensor, Transpose
  File "/home/hpc/.local/lib/python3.8/site-packages/mmdet/datasets/transforms/formatting.py", line 9, in <module>
    from mmdet.structures.bbox import BaseBoxes
  File "/home/hpc/.local/lib/python3.8/site-packages/mmdet/structures/bbox/__init__.py", line 2, in <module>
    from .base_boxes import BaseBoxes
  File "/home/hpc/.local/lib/python3.8/site-packages/mmdet/structures/bbox/base_boxes.py", line 9, in <module>
    from mmdet.structures.mask.structures import BitmapMasks, PolygonMasks
  File "/home/hpc/.local/lib/python3.8/site-packages/mmdet/structures/mask/__init__.py", line 3, in <module>
    from .structures import (BaseInstanceMasks, BitmapMasks, PolygonMasks,
  File "/home/hpc/.local/lib/python3.8/site-packages/mmdet/structures/mask/structures.py", line 9, in <module>
    from mmcv.ops.roi_align import roi_align
  File "/home/hpc/.local/lib/python3.8/site-packages/mmcv/ops/__init__.py", line 2, in <module>
    from .active_rotated_filter import active_rotated_filter
  File "/home/hpc/.local/lib/python3.8/site-packages/mmcv/ops/active_rotated_filter.py", line 10, in <module>
    ext_module = ext_loader.load_ext(
  File "/home/hpc/.local/lib/python3.8/site-packages/mmcv/utils/ext_loader.py", line 13, in load_ext
    ext = importlib.import_module('mmcv.' + name)
  File "/usr/Anaconda3/envs/open-mmlab/lib/python3.8/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
ImportError: /home/hpc/.local/lib/python3.8/site-packages/mmcv/_ext.cpython-38-x86_64-linux-gnu.so: undefined symbol: _ZNK2at6Tensor7is_cudaEv

一开始可以正常跑，后来可能更新的某些包，就一直报这个错

Environment

sys.platform: linux
Python: 3.8.13 (default, Mar 28 2022, 11:38:47) [GCC 7.5.0]
CUDA available: True
numpy_random_seed: 2147483648
GPU 0: NVIDIA GeForce RTX 2080 Ti
CUDA_HOME: /usr/local/cuda-10.1
NVCC: Cuda compilation tools, release 10.1, V10.1.24
GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
PyTorch: 1.10.1
PyTorch compiling details: PyTorch built with:

GCC 7.3
C++ Version: 201402
Intel(R) oneAPI Math Kernel Library Version 2021.4-Product Build 20210904 for Intel(R) 64 architecture applications
Intel(R) MKL-DNN v2.2.3 (Git Hash 7336ca9f055cf1bfa13efb658fe15dc9b41f0740)
OpenMP 201511 (a.k.a. OpenMP 4.5)
LAPACK is enabled (usually provided by MKL)
NNPACK is enabled
CPU capability usage: AVX512
CUDA Runtime 11.3
NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37
CuDNN 8.2
Magma 2.5.2
Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.3, CUDNN_VERSION=8.2.0, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.10.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,

TorchVision: 0.8.2+cu101
OpenCV: 4.6.0
MMEngine: 0.1.0
MMCV: 2.0.0rc1
MMDetection: 3.0.0rc1
MMYOLO: 0.1.1+db73593

Additional information

I have checked the version for cudatoolkit=10.1, but it doesn't work. Maybe cudatoolkit too low?

Haian Huang(深度眸) · Answer 1 · Wed Nov 02 2022 16:59:19 GMT+0800 (China Standard Time)

conda create -n open-mmlab python=3.8 pytorch==1.10.1 torchvision==0.11.2 cudatoolkit=10.2 -c pytorch -y

Seperendity · Answer 2 · Wed Nov 02 2022 20:01:31 GMT+0800 (China Standard Time)

I think it should not be the problem of cudatoolkit. Different versions of cudatoolkit still report the same error. At last, there is really no way. When the environment is deleted and reinstalled, the original cudatoolkit=11.3 on the official website is work for training and testing. At present, there is no problem. I Still don't konw what cause the issue.