Maybe environmental problems
Seperendity opened this issue · comments
Prerequisite
- I have searched the existing and past issues but cannot get the expected help.
- I have read the FAQ documentation but cannot get the expected help.
- The bug has not been fixed in the latest version.
🐞 Describe the bug
python tools/train.py configs/yolov5/yolov5_s-v61_syncbn_fast_1xb4-300e_balloon.py
Traceback (most recent call last):
File "tools/train.py", line 106, in <module>
main()
File "tools/train.py", line 56, in main
register_all_modules(init_default_scope=False)
File "/home/hpc/mmyolo/mmyolo/utils/setup_env.py", line 20, in register_all_modules
import mmdet.visualization # noqa: F401,F403
File "/home/hpc/.local/lib/python3.8/site-packages/mmdet/visualization/__init__.py", line 2, in <module>
from .local_visualizer import DetLocalVisualizer
File "/home/hpc/.local/lib/python3.8/site-packages/mmdet/visualization/local_visualizer.py", line 12, in <module>
from ..evaluation import INSTANCE_OFFSET
File "/home/hpc/.local/lib/python3.8/site-packages/mmdet/evaluation/__init__.py", line 3, in <module>
from .metrics import * # noqa: F401,F403
File "/home/hpc/.local/lib/python3.8/site-packages/mmdet/evaluation/metrics/__init__.py", line 3, in <module>
from .coco_metric import CocoMetric
File "/home/hpc/.local/lib/python3.8/site-packages/mmdet/evaluation/metrics/coco_metric.py", line 15, in <module>
from mmdet.datasets.api_wrappers import COCO, COCOeval
File "/home/hpc/.local/lib/python3.8/site-packages/mmdet/datasets/__init__.py", line 13, in <module>
from .utils import get_loading_pipeline
File "/home/hpc/.local/lib/python3.8/site-packages/mmdet/datasets/utils.py", line 5, in <module>
from mmdet.datasets.transforms import LoadAnnotations, LoadPanopticAnnotations
File "/home/hpc/.local/lib/python3.8/site-packages/mmdet/datasets/transforms/__init__.py", line 6, in <module>
from .formatting import ImageToTensor, PackDetInputs, ToTensor, Transpose
File "/home/hpc/.local/lib/python3.8/site-packages/mmdet/datasets/transforms/formatting.py", line 9, in <module>
from mmdet.structures.bbox import BaseBoxes
File "/home/hpc/.local/lib/python3.8/site-packages/mmdet/structures/bbox/__init__.py", line 2, in <module>
from .base_boxes import BaseBoxes
File "/home/hpc/.local/lib/python3.8/site-packages/mmdet/structures/bbox/base_boxes.py", line 9, in <module>
from mmdet.structures.mask.structures import BitmapMasks, PolygonMasks
File "/home/hpc/.local/lib/python3.8/site-packages/mmdet/structures/mask/__init__.py", line 3, in <module>
from .structures import (BaseInstanceMasks, BitmapMasks, PolygonMasks,
File "/home/hpc/.local/lib/python3.8/site-packages/mmdet/structures/mask/structures.py", line 9, in <module>
from mmcv.ops.roi_align import roi_align
File "/home/hpc/.local/lib/python3.8/site-packages/mmcv/ops/__init__.py", line 2, in <module>
from .active_rotated_filter import active_rotated_filter
File "/home/hpc/.local/lib/python3.8/site-packages/mmcv/ops/active_rotated_filter.py", line 10, in <module>
ext_module = ext_loader.load_ext(
File "/home/hpc/.local/lib/python3.8/site-packages/mmcv/utils/ext_loader.py", line 13, in load_ext
ext = importlib.import_module('mmcv.' + name)
File "/usr/Anaconda3/envs/open-mmlab/lib/python3.8/importlib/__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
ImportError: /home/hpc/.local/lib/python3.8/site-packages/mmcv/_ext.cpython-38-x86_64-linux-gnu.so: undefined symbol: _ZNK2at6Tensor7is_cudaEv
一开始可以正常跑,后来可能更新的某些包,就一直报这个错
Environment
sys.platform: linux
Python: 3.8.13 (default, Mar 28 2022, 11:38:47) [GCC 7.5.0]
CUDA available: True
numpy_random_seed: 2147483648
GPU 0: NVIDIA GeForce RTX 2080 Ti
CUDA_HOME: /usr/local/cuda-10.1
NVCC: Cuda compilation tools, release 10.1, V10.1.24
GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
PyTorch: 1.10.1
PyTorch compiling details: PyTorch built with:
- GCC 7.3
- C++ Version: 201402
- Intel(R) oneAPI Math Kernel Library Version 2021.4-Product Build 20210904 for Intel(R) 64 architecture applications
- Intel(R) MKL-DNN v2.2.3 (Git Hash 7336ca9f055cf1bfa13efb658fe15dc9b41f0740)
- OpenMP 201511 (a.k.a. OpenMP 4.5)
- LAPACK is enabled (usually provided by MKL)
- NNPACK is enabled
- CPU capability usage: AVX512
- CUDA Runtime 11.3
- NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37
- CuDNN 8.2
- Magma 2.5.2
- Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.3, CUDNN_VERSION=8.2.0, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.10.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,
TorchVision: 0.8.2+cu101
OpenCV: 4.6.0
MMEngine: 0.1.0
MMCV: 2.0.0rc1
MMDetection: 3.0.0rc1
MMYOLO: 0.1.1+db73593
Additional information
I have checked the version for cudatoolkit=10.1, but it doesn't work. Maybe cudatoolkit too low?
conda create -n open-mmlab python=3.8 pytorch==1.10.1 torchvision==0.11.2 cudatoolkit=10.2 -c pytorch -y
I think it should not be the problem of cudatoolkit. Different versions of cudatoolkit still report the same error. At last, there is really no way. When the environment is deleted and reinstalled, the original cudatoolkit=11.3 on the official website is work for training and testing. At present, there is no problem. I Still don't konw what cause the issue.