open-mmlab / mmyolo

OpenMMLab YOLO series toolbox and benchmark. Implemented RTMDet, RTMDet-Rotated,YOLOv5, YOLOv6, YOLOv7, YOLOv8,YOLOX, PPYOLOE, etc.

Home Page:https://mmyolo.readthedocs.io/zh_CN/dev/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Maybe environmental problems

Seperendity opened this issue · comments

Prerequisite

🐞 Describe the bug

python tools/train.py configs/yolov5/yolov5_s-v61_syncbn_fast_1xb4-300e_balloon.py
Traceback (most recent call last):
  File "tools/train.py", line 106, in <module>
    main()
  File "tools/train.py", line 56, in main
    register_all_modules(init_default_scope=False)
  File "/home/hpc/mmyolo/mmyolo/utils/setup_env.py", line 20, in register_all_modules
    import mmdet.visualization  # noqa: F401,F403
  File "/home/hpc/.local/lib/python3.8/site-packages/mmdet/visualization/__init__.py", line 2, in <module>
    from .local_visualizer import DetLocalVisualizer
  File "/home/hpc/.local/lib/python3.8/site-packages/mmdet/visualization/local_visualizer.py", line 12, in <module>
    from ..evaluation import INSTANCE_OFFSET
  File "/home/hpc/.local/lib/python3.8/site-packages/mmdet/evaluation/__init__.py", line 3, in <module>
    from .metrics import *  # noqa: F401,F403
  File "/home/hpc/.local/lib/python3.8/site-packages/mmdet/evaluation/metrics/__init__.py", line 3, in <module>
    from .coco_metric import CocoMetric
  File "/home/hpc/.local/lib/python3.8/site-packages/mmdet/evaluation/metrics/coco_metric.py", line 15, in <module>
    from mmdet.datasets.api_wrappers import COCO, COCOeval
  File "/home/hpc/.local/lib/python3.8/site-packages/mmdet/datasets/__init__.py", line 13, in <module>
    from .utils import get_loading_pipeline
  File "/home/hpc/.local/lib/python3.8/site-packages/mmdet/datasets/utils.py", line 5, in <module>
    from mmdet.datasets.transforms import LoadAnnotations, LoadPanopticAnnotations
  File "/home/hpc/.local/lib/python3.8/site-packages/mmdet/datasets/transforms/__init__.py", line 6, in <module>
    from .formatting import ImageToTensor, PackDetInputs, ToTensor, Transpose
  File "/home/hpc/.local/lib/python3.8/site-packages/mmdet/datasets/transforms/formatting.py", line 9, in <module>
    from mmdet.structures.bbox import BaseBoxes
  File "/home/hpc/.local/lib/python3.8/site-packages/mmdet/structures/bbox/__init__.py", line 2, in <module>
    from .base_boxes import BaseBoxes
  File "/home/hpc/.local/lib/python3.8/site-packages/mmdet/structures/bbox/base_boxes.py", line 9, in <module>
    from mmdet.structures.mask.structures import BitmapMasks, PolygonMasks
  File "/home/hpc/.local/lib/python3.8/site-packages/mmdet/structures/mask/__init__.py", line 3, in <module>
    from .structures import (BaseInstanceMasks, BitmapMasks, PolygonMasks,
  File "/home/hpc/.local/lib/python3.8/site-packages/mmdet/structures/mask/structures.py", line 9, in <module>
    from mmcv.ops.roi_align import roi_align
  File "/home/hpc/.local/lib/python3.8/site-packages/mmcv/ops/__init__.py", line 2, in <module>
    from .active_rotated_filter import active_rotated_filter
  File "/home/hpc/.local/lib/python3.8/site-packages/mmcv/ops/active_rotated_filter.py", line 10, in <module>
    ext_module = ext_loader.load_ext(
  File "/home/hpc/.local/lib/python3.8/site-packages/mmcv/utils/ext_loader.py", line 13, in load_ext
    ext = importlib.import_module('mmcv.' + name)
  File "/usr/Anaconda3/envs/open-mmlab/lib/python3.8/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
ImportError: /home/hpc/.local/lib/python3.8/site-packages/mmcv/_ext.cpython-38-x86_64-linux-gnu.so: undefined symbol: _ZNK2at6Tensor7is_cudaEv

一开始可以正常跑,后来可能更新的某些包,就一直报这个错

Environment

sys.platform: linux
Python: 3.8.13 (default, Mar 28 2022, 11:38:47) [GCC 7.5.0]
CUDA available: True
numpy_random_seed: 2147483648
GPU 0: NVIDIA GeForce RTX 2080 Ti
CUDA_HOME: /usr/local/cuda-10.1
NVCC: Cuda compilation tools, release 10.1, V10.1.24
GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
PyTorch: 1.10.1
PyTorch compiling details: PyTorch built with:

  • GCC 7.3
  • C++ Version: 201402
  • Intel(R) oneAPI Math Kernel Library Version 2021.4-Product Build 20210904 for Intel(R) 64 architecture applications
  • Intel(R) MKL-DNN v2.2.3 (Git Hash 7336ca9f055cf1bfa13efb658fe15dc9b41f0740)
  • OpenMP 201511 (a.k.a. OpenMP 4.5)
  • LAPACK is enabled (usually provided by MKL)
  • NNPACK is enabled
  • CPU capability usage: AVX512
  • CUDA Runtime 11.3
  • NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37
  • CuDNN 8.2
  • Magma 2.5.2
  • Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.3, CUDNN_VERSION=8.2.0, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.10.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,

TorchVision: 0.8.2+cu101
OpenCV: 4.6.0
MMEngine: 0.1.0
MMCV: 2.0.0rc1
MMDetection: 3.0.0rc1
MMYOLO: 0.1.1+db73593

Additional information

I have checked the version for cudatoolkit=10.1, but it doesn't work. Maybe cudatoolkit too low?

conda create -n open-mmlab python=3.8 pytorch==1.10.1 torchvision==0.11.2 cudatoolkit=10.2 -c pytorch -y

I think it should not be the problem of cudatoolkit. Different versions of cudatoolkit still report the same error. At last, there is really no way. When the environment is deleted and reinstalled, the original cudatoolkit=11.3 on the official website is work for training and testing. At present, there is no problem. I Still don't konw what cause the issue.