mindspore-lab / mindocr

A toolbox of OCR models, algorithms, and pipelines based on MindSpore

Home Page:https://mindspore-lab.github.io/mindocr/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

CRNN自定义数据集存在与数据绑定的损失上溢 loss:65504

panxua opened this issue · comments

现象:
存在和数据绑定的损失函数上溢
截图:
损失上溢1115
现状: 已解决
原因:

  1. 对于“标注长度 > max_text_len”,数据处理会置空而没有提示
  2. 对于“标注长度 + 重复标识符 > pred_seq_len”,会导致CTCLoss上溢,无提示。

详细说明:地址
解决方法:
统计标注最大长度,配置seq_max_len;
统计标注+重复标识符最大长度,配置pred_seq_len
并分别修改训练、评估、预测中的img_shape中的宽度,满足4 x pred_seq_len
建议:
在raining_recognition_custom_dataset中提示用户,
https://github.com/mindspore-lab/mindocr/blob/main/docs/en/tutorials/training_recognition_custom_dataset.md
https://github.com/mindspore-lab/mindocr/blob/main/docs/cn/tutorials/training_recognition_custom_dataset.md

Hello, we provide two additional options to solve the problem you mentioned. For reason 1, you can add filter_max_len: True in your configure file to filter these problematic cases; And you can add filter_max_len: True and extra_count_if_repeat: True to filter these cases raised from reason 2. For detail, you can check configs/rec/svtr/svtr_tiny.yaml. :)