CRNN自定义数据集存在与数据绑定的损失上溢 loss:65504

Question

CRNN自定义数据集存在与数据绑定的损失上溢 loss:65504

panxua opened this issue 7 months ago · comments

现象：
存在和数据绑定的损失函数上溢
截图：

现状： 已解决
原因：

对于“标注长度 > max_text_len”，数据处理会置空而没有提示
对于“标注长度 + 重复标识符 > pred_seq_len”，会导致CTCLoss上溢，无提示。

详细说明：地址
解决方法：
统计标注最大长度，配置seq_max_len；
统计标注+重复标识符最大长度，配置pred_seq_len
并分别修改训练、评估、预测中的img_shape中的宽度，满足4 x pred_seq_len
建议：
在raining_recognition_custom_dataset中提示用户，
https://github.com/mindspore-lab/mindocr/blob/main/docs/en/tutorials/training_recognition_custom_dataset.md
https://github.com/mindspore-lab/mindocr/blob/main/docs/cn/tutorials/training_recognition_custom_dataset.md

Cheung Ka Wai · Answer 1 · Wed Nov 15 2023 12:59:46 GMT+0800 (China Standard Time)

Hello, we provide two additional options to solve the problem you mentioned. For reason 1, you can add filter_max_len: True in your configure file to filter these problematic cases; And you can add filter_max_len: True and extra_count_if_repeat: True to filter these cases raised from reason 2. For detail, you can check configs/rec/svtr/svtr_tiny.yaml. :)