mindspore-lab / mindocr

A toolbox of ocr models and algorithms based on MindSpore

Home Page:https://mindspore-lab.github.io/mindocr/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

FceNet算法单卡训练报错:IndexError: index 1 is out of bounds for axis 0 with size 1

HustleOoo opened this issue · comments

yaml:configs/det/fcenet/fce_icdar15.yaml
数据集:icdar2015

环境:
1、麒麟 + 910 PRO B
2、mindspore-modelzoo:23.0.RC1镜像

报错信息如下:
Start training... (The first epoch takes longer, please wait...)

[WARNING] ME(85268:281461116710864,MainProcess):2023-08-02-06:34:27.412.624 [mindspore/dataset/engine/datasets_user_defined.py:805] GeneratorDataset's num_parallel_workers: 8 is too large which may cause a lot of memory occupation (>85%) or out of memory(OOM) during multiprocessing. Therefore, it is recommended to reduce num_parallel_workers to 5 or smaller.
[WARNING] ME(85268:281461116710864,MainProcess):2023-08-02-06:34:28.777.086 [mindspore/dataset/engine/datasets_user_defined.py:805] GeneratorDataset's num_parallel_workers: 8 is too large which may cause a lot of memory occupation (>85%) or out of memory(OOM) during multiprocessing. Therefore, it is recommended to reduce num_parallel_workers to 4 or smaller.
[WARNING] ME(85268:281461116710864,MainProcess):2023-08-02-06:34:30.694.385 [mindspore/dataset/engine/datasets_user_defined.py:805] GeneratorDataset's num_parallel_workers: 8 is too large which may cause a lot of memory occupation (>85%) or out of memory(OOM) during multiprocessing. Therefore, it is recommended to reduce num_parallel_workers to 3 or smaller.
[WARNING] ME(85268:281461116710864,MainProcess):2023-08-02-06:34:45.217.470 [mindspore/dataset/engine/datasets_user_defined.py:805] GeneratorDataset's num_parallel_workers: 8 is too large which may cause a lot of memory occupation (>85%) or out of memory(OOM) during multiprocessing. Therefore, it is recommended to reduce num_parallel_workers to 2 or smaller.
[WARNING] MD(85268,ffef5d7af1f0,python):2023-08-02-06:40:50.304.566 [mindspore/ccsrc/minddata/dataset/engine/datasetops/source/generator_op.cc:220] operator()] Bad performance attention, it takes more than 25 seconds to generator.next new row, which might cause GetNext timeout problem when sink_mode=True. You can increase the parameter num_parallel_workers in GeneratorDataset / optimize the efficiency of obtaining samples in the user-defined generator function.
[2023-08-02 06:43:54] mindocr.utils.callbacks INFO - epoch: [1/100], loss: 2.097338, epoch time: 568.243 s, per step time: 4545.944 ms, fps per card: 1.76 img/s
0%| | 0/500 [00:15<?, ?it/s]
Traceback (most recent call last):
File "tools/train.py", line 300, in
main(config)
File "tools/train.py", line 248, in main
initial_epoch=start_epoch,
File "/usr/local/python3.7.5/lib/python3.7/site-packages/mindspore/train/model.py", line 1061, in train
initial_epoch=initial_epoch)
File "/usr/local/python3.7.5/lib/python3.7/site-packages/mindspore/train/model.py", line 100, in wrapper
func(self, *args, **kwargs)
File "/usr/local/python3.7.5/lib/python3.7/site-packages/mindspore/train/model.py", line 617, in _train
cb_params, sink_size, initial_epoch, valid_infos)
File "/usr/local/python3.7.5/lib/python3.7/site-packages/mindspore/train/model.py", line 726, in _train_dataset_sink_process
list_callback.on_train_epoch_end(run_context)
File "/usr/local/python3.7.5/lib/python3.7/site-packages/mindspore/train/callback/_callback.py", line 371, in on_train_epoch_end
cb.on_train_epoch_end(run_context)
File "/home/mindocr-main/mindocr/utils/callbacks.py", line 193, in on_train_epoch_end
measures = self.net_evaluator.eval()
File "/home/mindocr-main/mindocr/utils/evaluator.py", line 152, in eval
preds = self.postprocessor(preds, **data_info)
File "/home/mindocr-main/mindocr/postprocess/det_base_postprocess.py", line 106, in call
result = self.rescale(result, shape_list)
File "/home/mindocr-main/mindocr/postprocess/det_base_postprocess.py", line 147, in rescale
result[field][i] = self._rescale_polygons(sample, shape_list[i])
IndexError: index 1 is out of bounds for axis 0 with size 1

yaml部分配置如下:
system:
mode: 0 # 0 for graph mode, 1 for pynative mode in MindSpore
distribute: False
amp_level: 'O0'
seed: 42

log_interval: 10

val_while_train: False
drop_overflow_update: False

train:
ckpt_save_dir: './workspace/FCE'
dataset_sink_mode: True
ema: True
dataset:
type: DetDataset
dataset_root: /home/mindocr-main/dataset/ICDAR2015
data_dir: train/images
label_file: train/train_det_gt.txt

数据集目录结构如下:
.
├── test
│ ├── images
│ │ ├── img_1.jpg
│ │ ├── img_2.jpg
│ │ └── ...
│ └── test_det_gt.txt
└── train
├── images
│ ├── img_1.jpg
│ ├── img_2.jpg
│ └── ....jpg
└── train_det_gt.txt

分割类算法除fcenet外都能跑通,请问下该如何解决?