训练时评估验证集报错:
GZ-Metal-Cell opened this issue · comments
如题,使用 v0.3.1 的 MindOCR,使用 totaltext 训练 r18 的 DBNet,会在 1~2 个 epoch 的时候报错:
Traceback (most recent call last):
File "/home/ma-user/work/mindocr/tools/train.py", line 318, in <module>
main(config)
File "/home/ma-user/work/mindocr/tools/train.py", line 249, in main
model.train(
File "/home/ma-user/anaconda3/envs/MindSpore/lib/python3.9/site-packages/mindspore/train/model.py", line 1061, in train
self._train(epoch,
File "/home/ma-user/anaconda3/envs/MindSpore/lib/python3.9/site-packages/mindspore/train/model.py", line 113, in wrapper
func(self, *args, **kwargs)
File "/home/ma-user/anaconda3/envs/MindSpore/lib/python3.9/site-packages/mindspore/train/model.py", line 619, in _train
self._train_dataset_sink_process(epoch, train_dataset, list_callback,
File "/home/ma-user/anaconda3/envs/MindSpore/lib/python3.9/site-packages/mindspore/train/model.py", line 731, in _train_dataset_sink_process
list_callback.on_train_epoch_end(run_context)
File "/home/ma-user/anaconda3/envs/MindSpore/lib/python3.9/site-packages/mindspore/train/callback/_callback.py", line 402, in on_train_epoch_end
cb.on_train_epoch_end(run_context)
File "/home/ma-user/work/mindocr/mindocr/utils/callbacks.py", line 193, in on_train_epoch_end
measures = self.net_evaluator.eval()
File "/home/ma-user/work/mindocr/mindocr/utils/evaluator.py", line 152, in eval
preds = self.postprocessor(preds, **data_info)
File "/home/ma-user/work/mindocr/mindocr/postprocess/det_base_postprocess.py", line 102, in __call__
result = self._postprocess(pred, **kwargs)
File "/home/ma-user/work/mindocr/mindocr/postprocess/det_db_postprocess.py", line 82, in _postprocess
sample_polys, sample_scores = self._extract_preds(pr, segm)
File "/home/ma-user/work/mindocr/mindocr/postprocess/det_db_postprocess.py", line 113, in _extract_preds
poly = np.array(expand_poly(points, distance=poly.area * self._expand_ratio / poly.length))
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (2,) + inhomogeneous part.
[WARNING] ME(3002438:281461320040800,MainProcess):2024-01-05-15:35:29.222.905 [mindspore/dataset/engine/datasets_user_defined.py:264] Generator receives a termination signal, stop waiting for data from subprocess.
[WARNING] MD(3002438,ffff8a45f010,python):2024-01-05-15:35:34.445.975 [mindspore/ccsrc/minddata/dataset/engine/datasetops/data_queue_op.cc:115] ~DataQueueOp]
preprocess_batch: 1659;
batch_queue: 0, 0, 0, 0, 0, 0, 0, 0, 0, 0;
push_start_time -> push_end_time
2024-01-05-15:35:14.274.416 -> 2024-01-05-15:35:14.322.522
2024-01-05-15:35:15.497.472 -> 2024-01-05-15:35:15.556.873
2024-01-05-15:35:16.658.987 -> 2024-01-05-15:35:16.702.952
2024-01-05-15:35:16.901.313 -> 2024-01-05-15:35:16.946.270
2024-01-05-15:35:17.129.239 -> 2024-01-05-15:35:17.171.914
2024-01-05-15:35:17.523.003 -> 2024-01-05-15:35:17.557.942
2024-01-05-15:35:17.659.011 -> 2024-01-05-15:35:17.695.488
2024-01-05-15:35:17.824.658 -> 2024-01-05-15:35:17.857.248
2024-01-05-15:35:18.099.311 -> 2024-01-05-15:35:18.137.962
2024-01-05-15:35:19.059.404 -> 2024-01-05-15:35:19.089.938
For more details, please refer to the FAQ at https://www.mindspore.cn/docs/en/master/faq/data_processing.html.
所用环境如下:
(MindSpore) [ma-user ~]$conda list
/home/ma-user/anaconda3/lib/python3.7/site-packages/requests/__init__.py:91: RequestsDependencyWarning: urllib3 (1.26.12) or chardet (3.0.4) doesn't match a supported version!
RequestsDependencyWarning)
# packages in environment at /home/ma-user/anaconda3/envs/MindSpore:
#
# Name Version Build Channel
_openmp_mutex 4.5 2_gnu conda-forge
absl-py 0.13.0 <pip>
addict 2.4.0 <pip>
albumentations 0.4.5 <pip>
APScheduler 3.8.1 <pip>
arrow 1.2.3 <pip>
asgiref 3.5.2 <pip>
astroid 2.11.7 <pip>
asttokens 2.0.8 <pip>
astunparse 1.6.3 <pip>
attrs 19.3.0 <pip>
backcall 0.2.0 <pip>
backports.zoneinfo 0.2.1 <pip>
binaryornot 0.4.4 <pip>
boto3 1.12.22 <pip>
botocore 1.15.49 <pip>
bzip2 1.0.8 hfd63f10_2 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
ca-certificates 2022.6.15 h4fd8a4c_0 conda-forge
certifi 2022.9.24 <pip>
cffi 1.14.0 <pip>
chardet 3.0.4 <pip>
charset-normalizer 2.0.12 <pip>
click 8.1.3 <pip>
cloudpickle 1.3.0 <pip>
colorama 0.4.4 <pip>
configparser 5.2.0 <pip>
cookiecutter 2.1.1 <pip>
coverage 6.4.3 <pip>
cryptography 3.4.7 <pip>
cycler 0.11.0 <pip>
Cython 3.0.2 <pip>
dask 2.18.1 <pip>
debugpy 1.6.3 <pip>
decorator 4.4.1 <pip>
defusedxml 0.7.1 <pip>
dill 0.3.5.1 <pip>
Django 3.2.16 <pip>
docutils 0.15.2 <pip>
easydict 1.9 <pip>
entrypoints 0.4 <pip>
ephemeral-port-reserve 1.1.4 <pip>
esdk-obs-python 3.20.1 <pip>
et-xmlfile 1.1.0 <pip>
Flask 2.1.0 <pip>
fonttools 4.37.4 <pip>
freetype-py 2.3.0 <pip>
future 0.18.2 <pip>
futures 3.1.1 <pip>
gast 0.3.2 <pip>
gnureadline 8.1.2 <pip>
google-pasta 0.2.0 <pip>
grpcio 1.60.0 <pip>
grpcio-tools 1.26.0 <pip>
gunicorn 20.1.0 <pip>
h5py 3.9.0 <pip>
idna 2.10 <pip>
image 1.5.28 <pip>
imageio 2.9.0 <pip>
imgaug 0.2.6 <pip>
importlib-metadata 5.0.0 <pip>
iniconfig 1.1.1 <pip>
ipyfilechooser 0.6.0 <pip>
ipykernel 6.7.0 <pip>
ipython 7.34.0 <pip>
ipython-genutils 0.2.0 <pip>
ipywidgets 8.0.4 <pip>
isort 5.10.1 <pip>
itsdangerous 2.1.2 <pip>
jdcal 1.4.1 <pip>
jedi 0.18.1 <pip>
Jinja2 3.0.1 <pip>
jinja2-time 0.2.0 <pip>
jmespath 0.10.0 <pip>
joblib 1.3.2 <pip>
jupyter-client 7.3.4 <pip>
jupyter-core 4.11.1 <pip>
jupyterlab-widgets 3.0.5 <pip>
Keras 2.3.1 <pip>
Keras-Applications 1.0.8 <pip>
Keras-Preprocessing 1.1.2 <pip>
keyboard 0.13.5 <pip>
kfac 0.2.0 <pip>
kiwisolver 1.1.0 <pip>
lanms 1.0.2 <pip>
lazy-import 0.2.2 <pip>
lazy-object-proxy 1.7.1 <pip>
ld_impl_linux-aarch64 2.36.1 h02ad14f_2 conda-forge
libcst 0.4.7 <pip>
libffi 3.4.2 h3557bc0_5 conda-forge
libgcc-ng 12.1.0 h3242a24_16 conda-forge
libgomp 12.1.0 h3242a24_16 conda-forge
libnsl 2.0.0 hf897c2e_0 conda-forge
libstdcxx-ng 12.1.0 hd01590b_16 conda-forge
libuuid 2.32.1 hf897c2e_1000 conda-forge
libzlib 1.2.12 h4e544f5_1 conda-forge
lmdb 1.4.1 <pip>
lxml 4.9.3 <pip>
MarkupSafe 2.1.1 <pip>
marshmallow 3.18.0 <pip>
matplotlib 3.5.1 <pip>
matplotlib-inline 0.1.3 <pip>
mccabe 0.7.0 <pip>
mindarmour 1.9.0 <pip>
mindformers 0.3.0 <pip>
mindinsight 1.9.0 <pip>
mindocr 0.2.0 <pip>
mindspore 2.1.1 <pip>
mmcv 2.0.1 <pip>
moxing-framework 2.0.1.rc0.ffd1c0c8 <pip>
mpmath 1.2.1 <pip>
mypy-extensions 0.4.3 <pip>
ncurses 6.3 headf329_1 conda-forge
nest-asyncio 1.5.5 <pip>
networkx 2.6.3 <pip>
ninja 1.10.2.3 <pip>
numba 0.47.0 <pip>
numexpr 2.8.6 <pip>
numpy 1.26.2 <pip>
opencv-python 4.8.0.76 <pip>
opencv-python-headless 4.8.1.78 <pip>
openpyxl 3.0.3 <pip>
openssl 3.0.5 h4e544f5_0 conda-forge
packaging 21.3 <pip>
pandas 1.1.3 <pip>
parso 0.8.3 <pip>
pathlib2 2.3.7 <pip>
pexpect 4.8.0 <pip>
pickleshare 0.7.5 <pip>
Pillow 9.2.0 <pip>
pip 23.2.1 py39hd43f75c_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
platformdirs 2.5.2 <pip>
pluggy 1.0.0 <pip>
prettytable 2.1.0 <pip>
prometheus-client 0.8.0 <pip>
prompt-toolkit 3.0.30 <pip>
protobuf 3.20.1 <pip>
psutil 5.7.0 <pip>
ptyprocess 0.7.0 <pip>
py 1.11.0 <pip>
pyclipper 1.3.0.post5 <pip>
pycocotools 2.0.7 <pip>
pycparser 2.21 <pip>
pycryptodome 3.10.1 <pip>
Pygments 2.12.0 <pip>
pylint 2.14.5 <pip>
pyparsing 3.0.9 <pip>
pypng 0.20220715.0 <pip>
pytest 7.1.2 <pip>
python 3.9.13 h5016f1d_0_cpython conda-forge
python-dateutil 2.8.2 <pip>
python-slugify 8.0.1 <pip>
pytz 2022.4 <pip>
pytz-deprecation-shim 0.1.0 <pip>
PyWavelets 1.1.1 <pip>
PyYAML 5.3.1 <pip>
pyzmq 23.2.0 <pip>
rapidfuzz 3.5.2 <pip>
readline 8.1.2 h38e3740_0 conda-forge
requests 2.31.0 <pip>
requests-futures 1.0.0 <pip>
s3transfer 0.3.7 <pip>
scikit-learn 1.0.2 <pip>
scipy 1.11.4 <pip>
semantic-version 2.8.5 <pip>
seqeval 1.2.2 <pip>
setuptools 68.0.0 py39hd43f75c_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
setuptools-scm 8.0.4 <pip>
Shapely 1.8.4 <pip>
six 1.16.0 <pip>
sqlite 3.39.0 hc74f5b8_0 conda-forge
sqlparse 0.4.3 <pip>
sympy 1.4 <pip>
synr 0.5.0 <pip>
tabulate 0.8.9 <pip>
tenacity 8.0.1 <pip>
tensorflow-probability 0.10.1 <pip>
terminaltables 3.1.0 <pip>
text-unidecode 1.3 <pip>
threadpoolctl 3.2.0 <pip>
tifffile 2021.11.2 <pip>
tk 8.6.12 hd8af866_0 conda-forge
toml 0.10.1 <pip>
tomli 2.0.1 <pip>
tomlkit 0.11.5 <pip>
topi 0.4.0 <pip>
tornado 6.2 <pip>
tqdm 4.46.1 <pip>
traitlets 5.3.0 <pip>
treelib 1.6.1 <pip>
typed-ast 1.5.4 <pip>
typing-inspect 0.8.0 <pip>
typing_extensions 4.4.0 <pip>
tzdata 2023c h04d1e81_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
tzdata 2022.7 <pip>
tzlocal 4.2 <pip>
umap-learn-modified 0.3.8 <pip>
urllib3 2.0.4 <pip>
wcwidth 0.2.5 <pip>
Werkzeug 2.2.2 <pip>
wheel 0.38.4 py39hd43f75c_0 https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
widgetsnbextension 4.0.5 <pip>
wrapt 1.14.1 <pip>
xlrd 1.2.0 <pip>
XlsxWriter 3.0.3 <pip>
xml-python 0.4.3 <pip>
xmltodict 0.12.0 <pip>
xz 5.2.5 h6dd45c4_1 conda-forge
yapf 0.32.0 <pip>
zipp 3.8.1 <pip>
zlib 1.2.12 h4e544f5_1 conda-forge
是否有解决方案?谢谢!
您好,感谢您的反馈。
v0.3.x适配的是MindSpore r2.2.10及其后续bug fix版本,请考虑优先MindSpore r2.2.11。
根据您反馈的错误日志,疑似是数据后处理函数,与数据集格式适配存在bug。开发工程师正在进行debug。
您好,我们在MindSpore r2.2.11 release版本和MindOCR v0.3.1版本上,进行了测试,未复现您所提交的问题。
部分训练日志如后文所附。
建议您:
- 改为安装MindSpore r2.2.11,并通过requirements.txt安装对应的依赖包;
- 请检查在转换Total-Text数据集时,是否有错误信息。
(ms-dev) [psw@10-90-43-193 mindocr]$python tools/train.py -c configs/det/dbnet/db_r18_totaltext.yaml
[2024-02-01 01:45:11] mindocr.train INFO - Standalone training. Device id: 0, specified by system.device_id in yaml config file or is default value 0.
[2024-02-01 01:45:14] mindocr.data.builder INFO - Creating dataloader (training=True) for device 0. Number of data samples: 1255 per device (1255 total).
[2024-02-01 01:45:17] mindocr.data.builder INFO - Creating dataloader (training=False) for device 0. Number of data samples: 300 per device (300 total).
[2024-02-01 01:45:17] mindocr.models.utils.load_model INFO - Finish loading model checkoint from https://download.mindspore.cn/toolkits/mindocr/dbnet/dbnet_resnet18_synthtext-251ef3dd.ckpt. If no parameter fail-load warning displayed, all checkpoint params have been successfully loaded.
[2024-02-01 01:45:17] mindocr.optim.param_grouping INFO - no parameter grouping is applied.
[2024-02-01 01:45:22] mindocr.train INFO -
========================================
Distribute: False
Model: det_resnet18-DBFPN-DBHead
Total number of parameters: 12351042
Total number of trainable parameters: 12340930
Data root: /ms_test3/psw/code/mindocr
Optimizer: SGD
Weight decay: 0.0001
Batch size: 20
Num devices: 1
Gradient accumulation steps: 1
Global batch size: 20x1x1=20
LR: 0.007
Scheduler: polynomial_decay
Steps per epoch: 62
Num epochs: 1200
Clip gradient: False
EMA: True
AMP level: O0
Loss scaler: {'type': 'dynamic', 'loss_scale': 512, 'scale_factor': 2, 'scale_window': 1000}
Drop overflow update: False
========================================
Start training... (The first epoch takes longer, please wait...)
[2024-02-01 01:46:13] mindocr.utils.callbacks INFO - epoch: [1/1200], loss: 2.780020, epoch time: 50.894 s, per step time: 820.863 ms, fps per card: 24.36 img/s
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 300/300 [00:13<00:00, 21.51it/s]
[2024-02-01 01:46:27] mindocr.utils.callbacks INFO - Performance: {'recall': 0.5787810383747178, 'precision': 0.8835286009648519, 'f-score': 0.6993998908892527}, eval time: 14.135913610458374
[2024-02-01 01:46:27] mindocr.utils.callbacks INFO - => Best f-score: 0.6993998908892527, checkpoint saved.
[2024-02-01 01:46:51] mindocr.utils.callbacks INFO - epoch: [2/1200], loss: 2.967140, epoch time: 23.089 s, per step time: 372.398 ms, fps per card: 53.71 img/s
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 300/300 [00:13<00:00, 23.02it/s]
[2024-02-01 01:47:04] mindocr.utils.callbacks INFO - Performance: {'recall': 0.6072234762979684, 'precision': 0.9026845637583892, 'f-score': 0.7260458839406208}, eval time: 13.16214632987976
[2024-02-01 01:47:04] mindocr.utils.callbacks INFO - => Best f-score: 0.7260458839406208, checkpoint saved.
[2024-02-01 01:47:29] mindocr.utils.callbacks INFO - epoch: [3/1200], loss: 2.520107, epoch time: 24.156 s, per step time: 389.613 ms, fps per card: 51.33 img/s
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 300/300 [00:14<00:00, 21.20it/s]
[2024-02-01 01:47:43] mindocr.utils.callbacks INFO - Performance: {'recall': 0.6261851015801354, 'precision': 0.8604218362282878, 'f-score': 0.7248497517637835}, eval time: 14.256935596466064
[2024-02-01 01:48:08] mindocr.utils.callbacks INFO - epoch: [4/1200], loss: 2.506505, epoch time: 24.326 s, per step time: 392.352 ms, fps per card: 50.97 img/s
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 300/300 [00:12<00:00, 24.39it/s]
[2024-02-01 01:48:21] mindocr.utils.callbacks INFO - Performance: {'recall': 0.5431151241534988, 'precision': 0.9065561416729465, 'f-score': 0.6792772444946358}, eval time: 12.399577140808105
[2024-02-01 01:48:46] mindocr.utils.callbacks INFO - epoch: [5/1200], loss: 2.584358, epoch time: 24.736 s, per step time: 398.971 ms, fps per card: 50.13 img/s
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 300/300 [00:12<00:00, 23.79it/s]
[2024-02-01 01:48:59] mindocr.utils.callbacks INFO - Performance: {'recall': 0.582844243792325, 'precision': 0.8909592822636301, 'f-score': 0.7046943231441047}, eval time: 12.713114023208618
[2024-02-01 01:49:24] mindocr.utils.callbacks INFO - epoch: [6/1200], loss: 2.986135, epoch time: 24.694 s, per step time: 398.292 ms, fps per card: 50.21 img/s
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 300/300 [00:12<00:00, 23.93it/s]
[2024-02-01 01:49:37] mindocr.utils.callbacks INFO - Performance: {'recall': 0.6419864559819413, 'precision': 0.8915360501567398, 'f-score': 0.7464566929133857}, eval time: 12.641420125961304
[2024-02-01 01:49:37] mindocr.utils.callbacks INFO - => Best f-score: 0.7464566929133857, checkpoint saved.
[2024-02-01 01:50:02] mindocr.utils.callbacks INFO - epoch: [7/1200], loss: 2.739786, epoch time: 24.156 s, per step time: 389.615 ms, fps per card: 51.33 img/s
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 300/300 [00:12<00:00, 24.54it/s]
[2024-02-01 01:50:14] mindocr.utils.callbacks INFO - Performance: {'recall': 0.6297968397291196, 'precision': 0.908203125, 'f-score': 0.743801652892562}, eval time: 12.314828634262085
[2024-02-01 01:50:38] mindocr.utils.callbacks INFO - epoch: [8/1200], loss: 2.367562, epoch time: 23.545 s, per step time: 379.756 ms, fps per card: 52.67 img/s
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 300/300 [00:13<00:00, 22.80it/s]
[2024-02-01 01:50:51] mindocr.utils.callbacks INFO - Performance: {'recall': 0.5688487584650113, 'precision': 0.9217264081931237, 'f-score': 0.7035175879396984}, eval time: 13.259095668792725
[2024-02-01 01:51:16] mindocr.utils.callbacks INFO - epoch: [9/1200], loss: 2.317336, epoch time: 24.136 s, per step time: 389.283 ms, fps per card: 51.38 img/s
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 300/300 [00:14<00:00, 20.05it/s]
[2024-02-01 01:51:31] mindocr.utils.callbacks INFO - Performance: {'recall': 0.6781038374717833, 'precision': 0.8637147786083956, 'f-score': 0.7597369752149721}, eval time: 15.05770754814148
[2024-02-01 01:51:31] mindocr.utils.callbacks INFO - => Best f-score: 0.7597369752149721, checkpoint saved.
[2024-02-01 01:51:55] mindocr.utils.callbacks INFO - epoch: [10/1200], loss: 2.895481, epoch time: 23.572 s, per step time: 380.187 ms, fps per card: 52.61 img/s
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 300/300 [00:11<00:00, 25.77it/s]
[2024-02-01 01:52:07] mindocr.utils.callbacks INFO - Performance: {'recall': 0.636117381489842, 'precision': 0.9066924066924067, 'f-score': 0.7476784292915892}, eval time: 11.734557867050171
由于未能复现您所提问题,本Issue暂时关闭。
请您尝试安装MindOCR所适配的MindSpore版本。
如有进一步的问题,请与我们联系。