KaihuaTang / Long-Tailed-Recognition.pytorch

[NeurIPS 2020] This project provides a strong single-stage baseline for Long-Tailed Classification, Detection, and Instance Segmentation (LVIS). It is also a PyTorch implementation of the NeurIPS 2020 paper 'Long-Tailed Classification by Keeping the Good and Removing the Bad Momentum Causal Effect'.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

LVIS training bug: TypeError: can't pickle _thread.RLock objects

nemonameless opened this issue · comments

Describe the bug
training on COCO dataset is ok, but when I train on LVIS meet this bug.

Environment

  1. Please run python mmdet/utils/collect_env.py to collect necessary environment information and paste it here.
sys.platform: linux
Python: 3.6.10 |Anaconda, Inc.| (default, Mar 25 2020, 23:51:54) [GCC 7.3.0]
CUDA available: False
GCC: gcc (GCC) 5.2.0
PyTorch: 1.4.0
PyTorch compiling details: PyTorch built with:
  - GCC 7.3
  - Intel(R) Math Kernel Library Version 2019.0.4 Product Build 20190411 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v0.21.1 (Git Hash 7d2fd500bc78936d1d648ca713b901012f470dbc)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - NNPACK is enabled
  - Build settings: BLAS=MKL, BUILD_NAMEDTENSOR=OFF, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -fopenmp -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Wno-stringop-overflow, DISABLE_NUMA=1, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DISPATCH=OFF, 

TorchVision: 0.5.0
OpenCV: 4.4.0
MMCV: 1.1.2
MMDetection: 2.4.0+
MMDetection Compiler: GCC 7.3
MMDetection CUDA Compiler: 10.1

Error traceback
If applicable, paste the error trackback here.

2020-11-14 20:28:12,249 - mmdet - INFO - workflow: [('train', 1)], max: 12 epochs
Traceback (most recent call last):
  File "./tools/train.py", line 177, in <module>
    main()
  File "./tools/train.py", line 173, in main
    meta=meta)
  File "/data/cdp_algo_ceph_ssd/users/georgeni/causallvis/mmdet/apis/train.py", line 143, in train_detector
    runner.run(data_loaders, cfg.workflow, cfg.total_epochs)
  File "/data/anaconda3/envs/zxcheng/lib/python3.6/site-packages/mmcv/runner/epoch_based_runner.py", line 122, in run
    epoch_runner(data_loaders[i], **kwargs)
  File "/data/anaconda3/envs/zxcheng/lib/python3.6/site-packages/mmcv/runner/epoch_based_runner.py", line 27, in train
    for i, data_batch in enumerate(self.data_loader):
  File "/data/anaconda3/envs/zxcheng/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 279, in __iter__
    return _MultiProcessingDataLoaderIter(self)
  File "/data/anaconda3/envs/zxcheng/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 719, in __init__
    w.start()
  File "/data/anaconda3/envs/zxcheng/lib/python3.6/multiprocessing/process.py", line 105, in start
    self._popen = self._Popen(self)
  File "/data/anaconda3/envs/zxcheng/lib/python3.6/multiprocessing/context.py", line 223, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "/data/anaconda3/envs/zxcheng/lib/python3.6/multiprocessing/context.py", line 284, in _Popen
    return Popen(process_obj)
  File "/data/anaconda3/envs/zxcheng/lib/python3.6/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File "/data/anaconda3/envs/zxcheng/lib/python3.6/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/data/anaconda3/envs/zxcheng/lib/python3.6/multiprocessing/popen_spawn_posix.py", line 47, in _launch
    reduction.dump(process_obj, fp)
  File "/data/anaconda3/envs/zxcheng/lib/python3.6/multiprocessing/reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
TypeError: can't pickle _thread.RLock objects
Traceback (most recent call last):
  File "./tools/train.py", line 177, in <module>
    main()
  File "./tools/train.py", line 173, in main
    meta=meta)
  File "/data/cdp_algo_ceph_ssd/users/georgeni/causallvis/mmdet/apis/train.py", line 143, in train_detector
    runner.run(data_loaders, cfg.workflow, cfg.total_epochs)
  File "/data/anaconda3/envs/zxcheng/lib/python3.6/site-packages/mmcv/runner/epoch_based_runner.py", line 122, in run
    epoch_runner(data_loaders[i], **kwargs)
  File "/data/anaconda3/envs/zxcheng/lib/python3.6/site-packages/mmcv/runner/epoch_based_runner.py", line 27, in train
    for i, data_batch in enumerate(self.data_loader):
  File "/data/anaconda3/envs/zxcheng/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 279, in __iter__
    return _MultiProcessingDataLoaderIter(self)
  File "/data/anaconda3/envs/zxcheng/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 719, in __init__
    w.start()
  File "/data/anaconda3/envs/zxcheng/lib/python3.6/multiprocessing/process.py", line 105, in start
    self._popen = self._Popen(self)
  File "/data/anaconda3/envs/zxcheng/lib/python3.6/multiprocessing/context.py", line 223, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "/data/anaconda3/envs/zxcheng/lib/python3.6/multiprocessing/context.py", line 284, in _Popen
    return Popen(process_obj)
  File "/data/anaconda3/envs/zxcheng/lib/python3.6/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File "/data/anaconda3/envs/zxcheng/lib/python3.6/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/data/anaconda3/envs/zxcheng/lib/python3.6/multiprocessing/popen_spawn_posix.py", line 47, in _launch
    reduction.dump(process_obj, fp)
  File "/data/anaconda3/envs/zxcheng/lib/python3.6/multiprocessing/reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
TypeError: can't pickle _thread.RLock objects
^C^C^C^C^C^C^C^C^C^C^C^C^CTraceback (most recent call last):
  File "/data/anaconda3/envs/zxcheng/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/data/anaconda3/envs/zxcheng/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/data/anaconda3/envs/zxcheng/lib/python3.6/site-packages/torch/distributed/launch.py", line 263, in <module>
    main()
  File "/data/anaconda3/envs/zxcheng/lib/python3.6/site-packages/torch/distributed/launch.py", line 259, in main
    cmd=cmd)
subprocess.CalledProcessError: Command '['/data/anaconda3/envs/zxcheng/bin/python', '-u', './tools/train.py', '--local_rank=1', 'configs/lvis/htcnosemlvis.py', '--launcher', 'pytorch', '--work-dir', 'work_bendilvis/lvis/htcnosemlvis', '--no-validate']' returned non-zero exit status 1.

Are you able to solve this error?

i got the same error