mindspore-lab / mindocr

A toolbox of OCR models, algorithms, and pipelines based on MindSpore

Home Page:https://mindspore-lab.github.io/mindocr/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

When I used svtr_tiny.yaml to configure training, there was a loss of nan

TanateT opened this issue · comments

Hello, when I used svtr_tiny.yaml to configure training, I encountered a Loss of nan. May I ask what is the reason for this.The display is as follows:
[2024-02-23 02:12:14] mindocr.utils.callbacks INFO - epoch: [1/1] step: [100/4468], loss: 74.925575, lr: 0.000002, per step time: 2328.825 ms, fps per card: 0.43 img/s
[2024-02-23 02:13:34] mindocr.utils.callbacks INFO - epoch: [1/1] step: [200/4468], loss: 67.030891, lr: 0.000004, per step time: 797.335 ms, fps per card: 1.25 img/s
[2024-02-23 02:14:55] mindocr.utils.callbacks INFO - epoch: [1/1] step: [300/4468], loss: 59.087444, lr: 0.000007, per step time: 811.777 ms, fps per card: 1.23 img/s
[2024-02-23 02:16:16] mindocr.utils.callbacks INFO - epoch: [1/1] step: [400/4468], loss: 48.703842, lr: 0.000009, per step time: 803.489 ms, fps per card: 1.24 img/s
[2024-02-23 02:17:26] mindocr.utils.callbacks INFO - epoch: [1/1] step: [500/4468], loss: 42.299507, lr: 0.000011, per step time: 698.967 ms, fps per card: 1.43 img/s
[2024-02-23 02:18:26] mindocr.utils.callbacks INFO - epoch: [1/1] step: [600/4468], loss: 25.588881, lr: 0.000013, per step time: 607.706 ms, fps per card: 1.65 img/s
[2024-02-23 02:19:32] mindocr.utils.callbacks INFO - epoch: [1/1] step: [700/4468], loss: 15.521203, lr: 0.000016, per step time: 656.214 ms, fps per card: 1.52 img/s
[2024-02-23 02:20:34] mindocr.utils.callbacks INFO - epoch: [1/1] step: [800/4468], loss: 14.108238, lr: 0.000018, per step time: 617.219 ms, fps per card: 1.62 img/s
[2024-02-23 02:21:37] mindocr.utils.callbacks INFO - epoch: [1/1] step: [900/4468], loss: 11.061129, lr: 0.000020, per step time: 626.818 ms, fps per card: 1.60 img/s
[2024-02-23 02:22:39] mindocr.utils.callbacks INFO - epoch: [1/1] step: [1000/4468], loss: 17.575577, lr: 0.000022, per step time: 620.869 ms, fps per card: 1.61 img/s
[2024-02-23 02:23:41] mindocr.utils.callbacks INFO - epoch: [1/1] step: [1100/4468], loss: 39.586132, lr: 0.000025, per step time: 623.567 ms, fps per card: 1.60 img/s
[2024-02-23 02:24:44] mindocr.utils.callbacks INFO - epoch: [1/1] step: [1200/4468], loss: 24.556992, lr: 0.000027, per step time: 626.496 ms, fps per card: 1.60 img/s
[2024-02-23 02:25:47] mindocr.utils.callbacks INFO - epoch: [1/1] step: [1300/4468], loss: 9.554378, lr: 0.000029, per step time: 633.920 ms, fps per card: 1.58 img/s
[2024-02-23 02:26:50] mindocr.utils.callbacks INFO - epoch: [1/1] step: [1400/4468], loss: 15.654078, lr: 0.000031, per step time: 623.542 ms, fps per card: 1.60 img/s
[2024-02-23 02:28:05] mindocr.utils.callbacks INFO - epoch: [1/1] step: [1500/4468], loss: 10.580729, lr: 0.000034, per step time: 757.204 ms, fps per card: 1.32 img/s
[2024-02-23 02:29:26] mindocr.utils.callbacks INFO - epoch: [1/1] step: [1600/4468], loss: 11.134408, lr: 0.000036, per step time: 809.373 ms, fps per card: 1.24 img/s
[2024-02-23 02:30:46] mindocr.utils.callbacks INFO - epoch: [1/1] step: [1700/4468], loss: 11.017653, lr: 0.000038, per step time: 801.426 ms, fps per card: 1.25 img/s
[2024-02-23 02:31:48] mindocr.utils.callbacks INFO - epoch: [1/1] step: [1800/4468], loss: 19.253716, lr: 0.000040, per step time: 612.632 ms, fps per card: 1.63 img/s
[2024-02-23 02:32:53] mindocr.utils.callbacks INFO - epoch: [1/1] step: [1900/4468], loss: 11.826553, lr: 0.000043, per step time: 655.296 ms, fps per card: 1.53 img/s
[2024-02-23 02:33:56] mindocr.utils.callbacks INFO - epoch: [1/1] step: [2000/4468], loss: 10.973602, lr: 0.000045, per step time: 618.851 ms, fps per card: 1.62 img/s
[2024-02-23 02:34:58] mindocr.utils.callbacks INFO - epoch: [1/1] step: [2100/4468], loss: nan, lr: 0.000047, per step time: 629.009 ms, fps per card: 1.59 img/s
[2024-02-23 02:36:00] mindocr.utils.callbacks INFO - epoch: [1/1] step: [2200/4468], loss: nan, lr: 0.000049, per step time: 619.918 ms, fps per card: 1.61 img/s
[2024-02-23 02:37:02] mindocr.utils.callbacks INFO - epoch: [1/1] step: [2300/4468], loss: nan, lr: 0.000051, per step time: 612.436 ms, fps per card: 1.63 img/s
[2024-02-23 02:38:12] mindocr.utils.callbacks INFO - epoch: [1/1] step: [2400/4468], loss: nan, lr: 0.000054, per step time: 703.533 ms, fps per card: 1.42 img/s
[2024-02-23 02:39:00] mindocr.data.transforms.rec_transforms WARNING - ... does not contain any valid character in the dictionary.
[2024-02-23 02:39:20] mindocr.utils.callbacks INFO - epoch: [1/1] step: [2500/4468], loss: nan, lr: 0.000056, per step time: 679.225 ms, fps per card: 1.47 img/s
[2024-02-23 02:40:23] mindocr.utils.callbacks INFO - epoch: [1/1] step: [2600/4468], loss: nan, lr: 0.000058, per step time: 631.269 ms, fps per card: 1.58 img/s
[2024-02-23 02:41:23] mindocr.utils.callbacks INFO - epoch: [1/1] step: [2700/4468], loss: nan, lr: 0.000060, per step time: 603.256 ms, fps per card: 1.66 img/s
[2024-02-23 02:42:27] mindocr.utils.callbacks INFO - epoch: [1/1] step: [2800/4468], loss: nan, lr: 0.000063, per step time: 635.580 ms, fps per card: 1.57 img/s
[2024-02-23 02:43:34] mindocr.utils.callbacks INFO - epoch: [1/1] step: [2900/4468], loss: nan, lr: 0.000065, per step time: 665.460 ms, fps per card: 1.50 img/s
[2024-02-23 02:44:38] mindocr.utils.callbacks INFO - epoch: [1/1] step: [3000/4468], loss: nan, lr: 0.000067, per step time: 642.669 ms, fps per card: 1.56 img/s
[2024-02-23 02:45:42] mindocr.utils.callbacks INFO - epoch: [1/1] step: [3100/4468], loss: nan, lr: 0.000069, per step time: 643.852 ms, fps per card: 1.55 img/s
[2024-02-23 02:46:49] mindocr.utils.callbacks INFO - epoch: [1/1] step: [3200/4468], loss: nan, lr: 0.000072, per step time: 666.926 ms, fps per card: 1.50 img/s
[2024-02-23 02:47:50] mindocr.utils.callbacks INFO - epoch: [1/1] step: [3300/4468], loss: nan, lr: 0.000074, per step time: 609.449 ms, fps per card: 1.64 img/s
[2024-02-23 02:48:51] mindocr.utils.callbacks INFO - epoch: [1/1] step: [3400/4468], loss: nan, lr: 0.000076, per step time: 605.783 ms, fps per card: 1.65 img/s
[2024-02-23 02:49:52] mindocr.utils.callbacks INFO - epoch: [1/1] step: [3500/4468], loss: nan, lr: 0.000078, per step time: 609.124 ms, fps per card: 1.64 img/s
[2024-02-23 02:50:54] mindocr.utils.callbacks INFO - epoch: [1/1] step: [3600/4468], loss: nan, lr: 0.000081, per step time: 622.407 ms, fps per card: 1.61 img/s
[2024-02-23 02:51:55] mindocr.utils.callbacks INFO - epoch: [1/1] step: [3700/4468], loss: nan, lr: 0.000083, per step time: 613.393 ms, fps per card: 1.63 img/s
[2024-02-23 02:52:58] mindocr.utils.callbacks INFO - epoch: [1/1] step: [3800/4468], loss: nan, lr: 0.000085, per step time: 633.255 ms, fps per card: 1.58 img/s
[2024-02-23 02:54:00] mindocr.utils.callbacks INFO - epoch: [1/1] step: [3900/4468], loss: nan, lr: 0.000087, per step time: 612.823 ms, fps per card: 1.63 img/s
[2024-02-23 02:55:01] mindocr.utils.callbacks INFO - epoch: [1/1] step: [4000/4468], loss: nan, lr: 0.000090, per step time: 609.343 ms, fps per card: 1.64 img/s
[2024-02-23 02:56:03] mindocr.utils.callbacks INFO - epoch: [1/1] step: [4100/4468], loss: nan, lr: 0.000092, per step time: 624.769 ms, fps per card: 1.60 img/s
[2024-02-23 02:57:06] mindocr.utils.callbacks INFO - epoch: [1/1] step: [4200/4468], loss: nan, lr: 0.000094, per step time: 628.695 ms, fps per card: 1.59 img/s
[2024-02-23 02:58:13] mindocr.utils.callbacks INFO - epoch: [1/1] step: [4300/4468], loss: nan, lr: 0.000096, per step time: 673.219 ms, fps per card: 1.49 img/s
[2024-02-23 02:59:20] mindocr.utils.callbacks INFO - epoch: [1/1] step: [4400/4468], loss: nan, lr: 0.000098, per step time: 670.237 ms, fps per card: 1.49 img/s
[2024-02-23 03:00:03] m

Hello, we have received your issue and we are working on it. If the problem mentioned above is reproduced, we would notify you. You could also try the following steps to solve it:

  1. Check the MindSpore and CANN version is matched. You could run the following code to check it.
    import mindspore
    mindspore.set_context(device_target="Ascend")
    mindspore.run_check()
    exit()
    Please refer to https://www.mindspore.cn/install/en#configuring-environment-variables and find more details. If some errors occur in the version check, please refer to the installation guide and reinstall CANN and MindSpore.
  2. O2 amp_level might cause unsteady training. You could try to set amp_level to O0. For example, if you are using configs/rec/svtr/svtr_tiny.yaml, you could find
    system:
      ...
      amp_level: O2
      amp_level_infer: O2 # running inference in O2 mode
      ...
    
    replace O2 with O0, and the yaml file would be changed like:
    system:
      ...
      amp_level: O0
      amp_level_infer: O0
      ...
    
    After this configuration, please try to relaunch your training procedure. The unsteady training problem might be alleviated.

Hello author, I am using GPU instead of NPU. CRNN can train normally, but SVTR has encountered the above issue.

Ok, would you mind offering your GPU information, cuda/cudnn version and mindspore version, so that we could try to reproduce the problem.

Hello author,The GPU and CUDA version information I am using are as follows:
NVIDIA-SMI 470.129.06
Driver Version: 470.129.06
CUDA Version: 11.4
The model of the graphics card:GeForce RTX 2080Ti

@TanateT
Hello, would you please provide the version of MindSpore you are using? So that we can reproduce the problem in our server.
It is recommended to use the latest release version (r2.2.11) of MindSpore.
Besides, the official support CUDA versions for MindSpore are CUDA 10.1, 11.1, and 11.6. You may encounter unknown problems if choosing a CUDA version without full validation.

@panshaowu
Hello,I am using (r2.2.11) of MindSpore. Are you saying that cuda 11.4 cannot be used, right?

@TanateT
Hello, I'm not suggesting that CUDA 11.4 cannot be used.
I mean, the release version of MindSpore has not been fully tested on CUDA 11.4.
Our colleague is trying to reproduce this problem on a GPU server. You can try to change the setting of AMP level (O2 -> O0) before receiving our reply. In some cases, the problem of NaN loss is caused by overflow of float16 variables.

Hello, we have received your issue and we are working on it. If the problem mentioned above is reproduced, we would notify you. You could also try the following steps to solve it:

  1. Check the MindSpore and CANN version is matched. You could run the following code to check it.

    import mindspore
    mindspore.set_context(device_target="Ascend")
    mindspore.run_check()
    exit()

    Please refer to https://www.mindspore.cn/install/en#configuring-environment-variables and find more details. If some errors occur in the version check, please refer to the installation guide and reinstall CANN and MindSpore.

  2. O2 amp_level might cause unsteady training. You could try to set amp_level to O0. For example, if you are using configs/rec/svtr/svtr_tiny.yaml, you could find

    system:
      ...
      amp_level: O2
      amp_level_infer: O2 # running inference in O2 mode
      ...
    

    replace O2 with O0, and the yaml file would be changed like:

    system:
      ...
      amp_level: O0
      amp_level_infer: O0
      ...
    

    After this configuration, please try to relaunch your training procedure. The unsteady training problem might be alleviated.

When training svtr_tiny with CUDA 11.1 and MindSpore 2.2.10, O2 mode may cause nan. And O0 mode might work. Please try O0 mode.

Ths, I tried to change the setting of AMP level (O2 -> O0).And the loss will no longer show NaN, but the loss value will fluctuate around 20 and not decrease. At the same time, the acc will remain at 0.The operation log is as follows:
[2024-03-04 11:25:46] mindocr.utils.callbacks INFO - Performance: {'acc': 0.0, 'norm_edit_distance': 0.0}, eval time: 30.399275541305542
[2024-03-04 11:26:12] mindocr.utils.callbacks INFO - epoch: [10/20] step: [100/4468], loss: 12.484948, lr: 0.000883, per step time: 224.442 ms, fps per card: 4.46 img/s
[2024-03-04 11:26:34] mindocr.utils.callbacks INFO - epoch: [10/20] step: [200/4468], loss: 17.369392, lr: 0.000883, per step time: 218.513 ms, fps per card: 4.58 img/s
[2024-03-04 11:26:56] mindocr.utils.callbacks INFO - epoch: [10/20] step: [300/4468], loss: 22.006893, lr: 0.000882, per step time: 219.676 ms, fps per card: 4.55 img/s
[2024-03-04 11:27:18] mindocr.utils.callbacks INFO - epoch: [10/20] step: [400/4468], loss: 15.044069, lr: 0.000881, per step time: 219.185 ms, fps per card: 4.56 img/s
[2024-03-04 11:27:40] mindocr.utils.callbacks INFO - epoch: [10/20] step: [500/4468], loss: 14.776553, lr: 0.000880, per step time: 220.036 ms, fps per card: 4.54 img/s
[2024-03-04 11:28:03] mindocr.utils.callbacks INFO - epoch: [10/20] step: [600/4468], loss: 15.237057, lr: 0.000879, per step time: 229.845 ms, fps per card: 4.35 img/s
[2024-03-04 11:28:25] mindocr.utils.callbacks INFO - epoch: [10/20] step: [700/4468], loss: 12.387613, lr: 0.000878, per step time: 224.995 ms, fps per card: 4.44 img/s
[2024-03-04 11:28:47] mindocr.utils.callbacks INFO - epoch: [10/20] step: [800/4468], loss: 8.567094, lr: 0.000877, per step time: 221.388 ms, fps per card: 4.52 img/s
[2024-03-04 11:29:09] mindocr.utils.callbacks INFO - epoch: [10/20] step: [900/4468], loss: 13.427640, lr: 0.000877, per step time: 221.403 ms, fps per card: 4.52 img/s
[2024-03-04 11:29:32] mindocr.utils.callbacks INFO - epoch: [10/20] step: [1000/4468], loss: 25.871349, lr: 0.000876, per step time: 223.456 ms, fps per card: 4.48 img/s
[2024-03-04 11:29:54] mindocr.utils.callbacks INFO - epoch: [10/20] step: [1100/4468], loss: 20.394947, lr: 0.000875, per step time: 222.974 ms, fps per card: 4.48 img/s
[2024-03-04 11:30:17] mindocr.utils.callbacks INFO - epoch: [10/20] step: [1200/4468], loss: 10.774661, lr: 0.000874, per step time: 228.551 ms, fps per card: 4.38 img/s
[2024-03-04 11:30:39] mindocr.utils.callbacks INFO - epoch: [10/20] step: [1300/4468], loss: 18.086420, lr: 0.000873, per step time: 221.452 ms, fps per card: 4.52 img/s
[2024-03-04 11:31:01] mindocr.utils.callbacks INFO - epoch: [10/20] step: [1400/4468], loss: 14.657832, lr: 0.000872, per step time: 219.678 ms, fps per card: 4.55 img/s
[2024-03-04 11:31:24] mindocr.utils.callbacks INFO - epoch: [10/20] step: [1500/4468], loss: 10.770251, lr: 0.000871, per step time: 226.276 ms, fps per card: 4.42 img/s
[2024-03-04 11:31:46] mindocr.utils.callbacks INFO - epoch: [10/20] step: [1600/4468], loss: 14.063519, lr: 0.000871, per step time: 221.062 ms, fps per card: 4.52 img/s
[2024-03-04 11:32:07] mindocr.utils.callbacks INFO - epoch: [10/20] step: [1700/4468], loss: 26.693460, lr: 0.000870, per step time: 214.654 ms, fps per card: 4.66 img/s
[2024-03-04 11:32:30] mindocr.utils.callbacks INFO - epoch: [10/20] step: [1800/4468], loss: 11.442185, lr: 0.000869, per step time: 227.773 ms, fps per card: 4.39 img/s
[2024-03-04 11:32:53] mindocr.utils.callbacks INFO - epoch: [10/20] step: [1900/4468], loss: 16.612226, lr: 0.000868, per step time: 225.437 ms, fps per card: 4.44 img/s
[2024-03-04 11:33:14] mindocr.utils.callbacks INFO - epoch: [10/20] step: [2000/4468], loss: 31.547741, lr: 0.000867, per step time: 219.388 ms, fps per card: 4.56 img/s
[2024-03-04 11:33:36] mindocr.utils.callbacks INFO - epoch: [10/20] step: [2100/4468], loss: 11.935366, lr: 0.000866, per step time: 219.342 ms, fps per card: 4.56 img/s
[2024-03-04 11:33:59] mindocr.utils.callbacks INFO - epoch: [10/20] step: [2200/4468], loss: 10.912749, lr: 0.000865, per step time: 221.209 ms, fps per card: 4.52 img/s
[2024-03-04 11:34:21] mindocr.utils.callbacks INFO - epoch: [10/20] step: [2300/4468], loss: 16.009239, lr: 0.000864, per step time: 220.266 ms, fps per card: 4.54 img/s
[2024-03-04 11:34:43] mindocr.utils.callbacks INFO - epoch: [10/20] step: [2400/4468], loss: 16.984287, lr: 0.000864, per step time: 224.946 ms, fps per card: 4.45 img/s
[2024-03-04 11:35:05] mindocr.utils.callbacks INFO - epoch: [10/20] step: [2500/4468], loss: 14.259339, lr: 0.000863, per step time: 222.307 ms, fps per card: 4.50 img/s
[2024-03-04 11:35:28] mindocr.utils.callbacks INFO - epoch: [10/20] step: [2600/4468], loss: 15.472371, lr: 0.000862, per step time: 227.389 ms, fps per card: 4.40 img/s
[2024-03-04 11:35:50] mindocr.utils.callbacks INFO - epoch: [10/20] step: [2700/4468], loss: 14.685236, lr: 0.000861, per step time: 220.341 ms, fps per card: 4.54 img/s
[2024-03-04 11:36:12] mindocr.utils.callbacks INFO - epoch: [10/20] step: [2800/4468], loss: 9.835842, lr: 0.000860, per step time: 221.385 ms, fps per card: 4.52 img/s
[2024-03-04 11:36:34] mindocr.utils.callbacks INFO - epoch: [10/20] step: [2900/4468], loss: 13.051644, lr: 0.000859, per step time: 220.113 ms, fps per card: 4.54 img/s
[2024-03-04 11:36:57] mindocr.utils.callbacks INFO - epoch: [10/20] step: [3000/4468], loss: 20.364574, lr: 0.000858, per step time: 224.627 ms, fps per card: 4.45 img/s
[2024-03-04 11:37:19] mindocr.utils.callbacks INFO - epoch: [10/20] step: [3100/4468], loss: 22.395132, lr: 0.000857, per step time: 224.215 ms, fps per card: 4.46 img/s
[2024-03-04 11:37:41] mindocr.utils.callbacks INFO - epoch: [10/20] step: [3200/4468], loss: 10.930746, lr: 0.000856, per step time: 216.354 ms, fps per card: 4.62 img/s
[2024-03-04 11:38:02] mindocr.utils.callbacks INFO - epoch: [10/20] step: [3300/4468], loss: 32.271301, lr: 0.000855, per step time: 215.975 ms, fps per card: 4.63 img/s
[2024-03-04 11:38:25] mindocr.utils.callbacks INFO - epoch: [10/20] step: [3400/4468], loss: 14.138934, lr: 0.000855, per step time: 226.921 ms, fps per card: 4.41 img/s
[2024-03-04 11:38:47] mindocr.utils.callbacks INFO - epoch: [10/20] step: [3500/4468], loss: 15.153858, lr: 0.000854, per step time: 220.594 ms, fps per card: 4.53 img/s
[2024-03-04 11:39:09] mindocr.utils.callbacks INFO - epoch: [10/20] step: [3600/4468], loss: 15.871033, lr: 0.000853, per step time: 221.792 ms, fps per card: 4.51 img/s
[2024-03-04 11:39:31] mindocr.utils.callbacks INFO - epoch: [10/20] step: [3700/4468], loss: 32.273537, lr: 0.000852, per step time: 220.235 ms, fps per card: 4.54 img/s
[2024-03-04 11:39:54] mindocr.utils.callbacks INFO - epoch: [10/20] step: [3800/4468], loss: 11.996705, lr: 0.000851, per step time: 222.643 ms, fps per card: 4.49 img/s
[2024-03-04 11:40:16] mindocr.utils.callbacks INFO - epoch: [10/20] step: [3900/4468], loss: 17.036886, lr: 0.000850, per step time: 219.431 ms, fps per card: 4.56 img/s
[2024-03-04 11:40:38] mindocr.utils.callbacks INFO - epoch: [10/20] step: [4000/4468], loss: 16.255884, lr: 0.000849, per step time: 223.401 ms, fps per card: 4.48 img/s
[2024-03-04 11:41:00] mindocr.utils.callbacks INFO - epoch: [10/20] step: [4100/4468], loss: 11.512460, lr: 0.000848, per step time: 222.068 ms, fps per card: 4.50 img/s
[2024-03-04 11:41:22] mindocr.utils.callbacks INFO - epoch: [10/20] step: [4200/4468], loss: 24.882137, lr: 0.000847, per step time: 223.730 ms, fps per card: 4.47 img/s
[2024-03-04 11:41:45] mindocr.utils.callbacks INFO - epoch: [10/20] step: [4300/4468], loss: 22.467775, lr: 0.000846, per step time: 226.801 ms, fps per card: 4.41 img/s
[2024-03-04 11:42:07] mindocr.utils.callbacks INFO - epoch: [10/20] step: [4400/4468], loss: 13.332610, lr: 0.000845, per step time: 221.484 ms, fps per card: 4.52 img/s
[2024-03-04 11:42:15] mindocr.data.transforms.rec_transforms WARNING - ... does not contain any valid character in the dictionary.
[2024-03-04 11:42:22] mindocr.utils.callbacks INFO - epoch: [10/20], loss: 18.189548, epoch time: 992.650 s, per step time: 222.169 ms, fps per card: 4.50 img/s
[2024-03-04 11:42:52] mindocr.metrics.rec_metrics INFO - correct num: 0, total num: 2077.0
[2024-03-04 11:42:52] mindocr.utils.callbacks INFO - Performance: {'acc': 0.0, 'norm_edit_distance': 0.0}, eval time: 29.747499465942383
[2024-03-04 11:43:16] mindocr.utils.callbacks INFO - epoch: [11/20] step: [100/4468], loss: 16.030214, lr: 0.000844, per step time: 220.158 ms, fps per card: 4.54 img/s
[2024-03-04 11:43:39] mindocr.utils.callbacks INFO - epoch: [11/20] step: [200/4468], loss: 26.730707, lr: 0.000843, per step time: 224.024 ms, fps per card: 4.46 img/s
[2024-03-04 11:44:01] mindocr.utils.callbacks INFO - epoch: [11/20] step: [300/4468], loss: 37.481255, lr: 0.000842, per step time: 222.887 ms, fps per card: 4.49 img/s
[2024-03-04 11:44:24] mindocr.utils.callbacks INFO - epoch: [11/20] step: [400/4468], loss: 19.619278, lr: 0.000841, per step time: 230.778 ms, fps per card: 4.33 img/s
[2024-03-04 11:44:46] mindocr.utils.callbacks INFO - epoch: [11/20] step: [500/4468], loss: 26.488934, lr: 0.000840, per step time: 221.857 ms, fps per card: 4.51 img/s
[2024-03-04 11:45:09] mindocr.utils.callbacks INFO - epoch: [11/20] step: [600/4468], loss: 12.448463, lr: 0.000839, per step time: 222.572 ms, fps per card: 4.49 img/s
[2024-03-04 11:45:32] mindocr.utils.callbacks INFO - epoch: [11/20] step: [700/4468], loss: 22.149632, lr: 0.000838, per step time: 231.466 ms, fps per card: 4.32 img/s
[2024-03-04 11:45:54] mindocr.utils.callbacks INFO - epoch: [11/20] step: [800/4468], loss: 11.687074, lr: 0.000837, per step time: 224.909 ms, fps per card: 4.45 img/s
[2024-03-04 11:46:17] mindocr.utils.callbacks INFO - epoch: [11/20] step: [900/4468], loss: 15.591729, lr: 0.000836, per step time: 225.182 ms, fps per card: 4.44 img/s
[2024-03-04 11:46:40] mindocr.utils.callbacks INFO - epoch: [11/20] step: [1000/4468], loss: 14.098429, lr: 0.000835, per step time: 228.203 ms, fps per card: 4.38 img/s
[2024-03-04 11:47:02] mindocr.utils.callbacks INFO - epoch: [11/20] step: [1100/4468], loss: 19.660013, lr: 0.000834, per step time: 222.255 ms, fps per card: 4.50 img/s
[2024-03-04 11:47:24] mindocr.utils.callbacks INFO - epoch: [11/20] step: [1200/4468], loss: 14.676831, lr: 0.000833, per step time: 220.881 ms, fps per card: 4.53 img/s
[2024-03-04 11:47:46] mindocr.utils.callbacks INFO - epoch: [11/20] step: [1300/4468], loss: 16.193998, lr: 0.000832, per step time: 223.041 ms, fps per card: 4.48 img/s
[2024-03-04 11:48:09] mindocr.utils.callbacks INFO - epoch: [11/20] step: [1400/4468], loss: 16.284693, lr: 0.000831, per step time: 225.467 ms, fps per card: 4.44 img/s
[2024-03-04 11:48:31] mindocr.utils.callbacks INFO - epoch: [11/20] step: [1500/4468], loss: 9.925570, lr: 0.000830, per step time: 223.921 ms, fps per card: 4.47 img/s
[2024-03-04 11:48:54] mindocr.utils.callbacks INFO - epoch: [11/20] step: [1600/4468], loss: 23.293972, lr: 0.000829, per step time: 223.751 ms, fps per card: 4.47 img/s
[2024-03-04 11:49:15] mindocr.utils.callbacks INFO - epoch: [11/20] step: [1700/4468], loss: 15.332523, lr: 0.000828, per step time: 218.251 ms, fps per card: 4.58 img/s
[2024-03-04 11:49:38] mindocr.utils.callbacks INFO - epoch: [11/20] step: [1800/4468], loss: 15.200744, lr: 0.000827, per step time: 222.720 ms, fps per card: 4.49 img/s
[2024-03-04 11:49:59] mindocr.utils.callbacks INFO - epoch: [11/20] step: [1900/4468], loss: 9.825425, lr: 0.000826, per step time: 218.166 ms, fps per card: 4.58 img/s
[2024-03-04 11:50:22] mindocr.utils.callbacks INFO - epoch: [11/20] step: [2000/4468], loss: 46.271423, lr: 0.000825, per step time: 227.095 ms, fps per card: 4.40 img/s
[2024-03-04 11:50:44] mindocr.utils.callbacks INFO - epoch: [11/20] step: [2100/4468], loss: 15.076401, lr: 0.000825, per step time: 220.832 ms, fps per card: 4.53 img/s
[2024-03-04 11:51:07] mindocr.utils.callbacks INFO - epoch: [11/20] step: [2200/4468], loss: 10.422530, lr: 0.000824, per step time: 228.983 ms, fps per card: 4.37 img/s
[2024-03-04 11:51:29] mindocr.utils.callbacks INFO - epoch: [11/20] step: [2300/4468], loss: 61.731621, lr: 0.000823, per step time: 219.029 ms, fps per card: 4.57 img/s
[2024-03-04 11:51:51] mindocr.utils.callbacks INFO - epoch: [11/20] step: [2400/4468], loss: 37.480408, lr: 0.000822, per step time: 223.333 ms, fps per card: 4.48 img/s
[2024-03-04 11:52:13] mindocr.utils.callbacks INFO - epoch: [11/20] step: [2500/4468], loss: 16.583775, lr: 0.000821, per step time: 218.933 ms, fps per card: 4.57 img/s
[2024-03-04 11:52:35] mindocr.utils.callbacks INFO - epoch: [11/20] step: [2600/4468], loss: 14.559378, lr: 0.000820, per step time: 219.845 ms, fps per card: 4.55 img/s
[2024-03-04 11:52:57] mindocr.utils.callbacks INFO - epoch: [11/20] step: [2700/4468], loss: 21.232960, lr: 0.000819, per step time: 220.544 ms, fps per card: 4.53 img/s
[2024-03-04 11:53:20] mindocr.utils.callbacks INFO - epoch: [11/20] step: [2800/4468], loss: 15.320299, lr: 0.000818, per step time: 222.263 ms, fps per card: 4.50 img/s
[2024-03-04 11:53:42] mindocr.utils.callbacks INFO - epoch: [11/20] step: [2900/4468], loss: 12.645747, lr: 0.000817, per step time: 224.474 ms, fps per card: 4.45 img/s
[2024-03-04 11:54:05] mindocr.utils.callbacks INFO - epoch: [11/20] step: [3000/4468], loss: 10.648980, lr: 0.000816, per step time: 225.800 ms, fps per card: 4.43 img/s
[2024-03-04 11:54:27] mindocr.utils.callbacks INFO - epoch: [11/20] step: [3100/4468], loss: 30.627602, lr: 0.000815, per step time: 224.385 ms, fps per card: 4.46 img/s
[2024-03-04 11:54:50] mindocr.utils.callbacks INFO - epoch: [11/20] step: [3200/4468], loss: 14.376083, lr: 0.000814, per step time: 225.474 ms, fps per card: 4.44 img/s
[2024-03-04 11:55:12] mindocr.utils.callbacks INFO - epoch: [11/20] step: [3300/4468], loss: 16.502594, lr: 0.000813, per step time: 225.438 ms, fps per card: 4.44 img/s
[2024-03-04 11:55:35] mindocr.utils.callbacks INFO - epoch: [11/20] step: [3400/4468], loss: 20.607311, lr: 0.000812, per step time: 225.279 ms, fps per card: 4.44 img/s
[2024-03-04 11:55:57] mindocr.utils.callbacks INFO - epoch: [11/20] step: [3500/4468], loss: 14.764703, lr: 0.000811, per step time: 220.270 ms, fps per card: 4.54 img/s
[2024-03-04 11:56:19] mindocr.utils.callbacks INFO - epoch: [11/20] step: [3600/4468], loss: 13.159198, lr: 0.000810, per step time: 226.520 ms, fps per card: 4.41 img/s
[2024-03-04 11:56:42] mindocr.utils.callbacks INFO - epoch: [11/20] step: [3700/4468], loss: 11.132410, lr: 0.000808, per step time: 223.691 ms, fps per card: 4.47 img/s
[2024-03-04 11:57:04] mindocr.utils.callbacks INFO - epoch: [11/20] step: [3800/4468], loss: 11.510989, lr: 0.000807, per step time: 227.573 ms, fps per card: 4.39 img/s
[2024-03-04 11:57:26] mindocr.utils.callbacks INFO - epoch: [11/20] step: [3900/4468], loss: 10.965407, lr: 0.000806, per step time: 220.020 ms, fps per card: 4.55 img/s
[2024-03-04 11:57:49] mindocr.utils.callbacks INFO - epoch: [11/20] step: [4000/4468], loss: 15.892295, lr: 0.000805, per step time: 227.038 ms, fps per card: 4.40 img/s
[2024-03-04 11:58:11] mindocr.utils.callbacks INFO - epoch: [11/20] step: [4100/4468], loss: 12.919291, lr: 0.000804, per step time: 220.833 ms, fps per card: 4.53 img/s
[2024-03-04 11:58:33] mindocr.utils.callbacks INFO - epoch: [11/20] step: [4200/4468], loss: 40.800201, lr: 0.000803, per step time: 221.830 ms, fps per card: 4.51 img/s
[2024-03-04 11:58:56] mindocr.utils.callbacks INFO - epoch: [11/20] step: [4300/4468], loss: 18.750589, lr: 0.000802, per step time: 224.351 ms, fps per card: 4.46 img/s
[2024-03-04 11:59:19] mindocr.utils.callbacks INFO - epoch: [11/20] step: [4400/4468], loss: 18.808271, lr: 0.000801, per step time: 227.336 ms, fps per card: 4.40 img/s
[2024-03-04 11:59:34] mindocr.utils.callbacks INFO - epoch: [11/20], loss: 18.194092, epoch time: 999.623 s, per step time: 223.729 ms, fps per card: 4.47 img/s
[2024-03-04 12:00:05] mindocr.metrics.rec_metrics INFO - correct num: 0, total num: 2077.0
[2024-03-04 12:00:05] mindocr.utils.callbacks INFO - Performance: {'acc': 0.0, 'norm_edit_distance': 0.0}, eval time: 30.696720123291016
[2024-03-04 12:00:29] mindocr.utils.callbacks INFO - epoch: [12/20] step: [100/4468], loss: 15.119713, lr: 0.000800, per step time: 218.643 ms, fps per card: 4.57 img/s
[2024-03-04 12:00:51] mindocr.utils.callbacks INFO - epoch: [12/20] step: [200/4468], loss: 11.791698, lr: 0.000799, per step time: 219.862 ms, fps per card: 4.55 img/s
[2024-03-04 12:01:13] mindocr.utils.callbacks INFO - epoch: [12/20] step: [300/4468], loss: 19.764574, lr: 0.000797, per step time: 223.668 ms, fps per card: 4.47 img/s
[2024-03-04 12:01:35] mindocr.utils.callbacks INFO - epoch: [12/20] step: [400/4468], loss: 13.575973, lr: 0.000796, per step time: 217.223 ms, fps per card: 4.60 img/s
[2024-03-04 12:01:38] mindocr.data.transforms.rec_transforms WARNING - ... does not contain any valid character in the dictionary.
[2024-03-04 12:01:57] mindocr.utils.callbacks INFO - epoch: [12/20] step: [500/4468], loss: 12.828393, lr: 0.000795, per step time: 220.522 ms, fps per card: 4.53 img/s
[2024-03-04 12:02:20] mindocr.utils.callbacks INFO - epoch: [12/20] step: [600/4468], loss: 13.145671, lr: 0.000794, per step time: 227.272 ms, fps per card: 4.40 img/s
[2024-03-04 12:02:42] mindocr.utils.callbacks INFO - epoch: [12/20] step: [700/4468], loss: 19.327375, lr: 0.000793, per step time: 223.489 ms, fps per card: 4.47 img/s
[2024-03-04 12:03:04] mindocr.utils.callbacks INFO - epoch: [12/20] step: [800/4468], loss: 14.977672, lr: 0.000792, per step time: 215.858 ms, fps per card: 4.63 img/s
[2024-03-04 12:03:25] mindocr.utils.callbacks INFO - epoch: [12/20] step: [900/4468], loss: 22.353065, lr: 0.000791, per step time: 214.546 ms, fps per card: 4.66 img/s
[2024-03-04 12:03:47] mindocr.utils.callbacks INFO - epoch: [12/20] step: [1000/4468], loss: 18.390974, lr: 0.000790, per step time: 220.665 ms, fps per card: 4.53 img/s
[2024-03-04 12:04:09] mindocr.utils.callbacks INFO - epoch: [12/20] step: [1100/4468], loss: 11.455965, lr: 0.000789, per step time: 217.766 ms, fps per card: 4.59 img/s
[2024-03-04 12:04:31] mindocr.utils.callbacks INFO - epoch: [12/20] step: [1200/4468], loss: 14.567901, lr: 0.000788, per step time: 216.182 ms, fps per card: 4.63 img/s
[2024-03-04 12:04:52] mindocr.utils.callbacks INFO - epoch: [12/20] step: [1300/4468], loss: 27.029684, lr: 0.000787, per step time: 213.299 ms, fps per card: 4.69 img/s
[2024-03-04 12:05:14] mindocr.utils.callbacks INFO - epoch: [12/20] step: [1400/4468], loss: 21.223015, lr: 0.000786, per step time: 222.366 ms, fps per card: 4.50 img/s
[2024-03-04 12:05:37] mindocr.utils.callbacks INFO - epoch: [12/20] step: [1500/4468], loss: 13.668790, lr: 0.000785, per step time: 226.093 ms, fps per card: 4.42 img/s
[2024-03-04 12:05:58] mindocr.utils.callbacks INFO - epoch: [12/20] step: [1600/4468], loss: 17.694632, lr: 0.000784, per step time: 217.165 ms, fps per card: 4.60 img/s
[2024-03-04 12:06:21] mindocr.utils.callbacks INFO - epoch: [12/20] step: [1700/4468], loss: 23.257511, lr: 0.000783, per step time: 222.935 ms, fps per card: 4.49 img/s
[2024-03-04 12:06:43] mindocr.utils.callbacks INFO - epoch: [12/20] step: [1800/4468], loss: 18.356245, lr: 0.000782, per step time: 218.880 ms, fps per card: 4.57 img/s
[2024-03-04 12:07:05] mindocr.utils.callbacks INFO - epoch: [12/20] step: [1900/4468], loss: 11.300078, lr: 0.000781, per step time: 227.359 ms, fps per card: 4.40 img/s
[2024-03-04 12:07:28] mindocr.utils.callbacks INFO - epoch: [12/20] step: [2000/4468], loss: 14.937763, lr: 0.000780, per step time: 225.254 ms, fps per card: 4.44 img/s
[2024-03-04 12:07:50] mindocr.utils.callbacks INFO - epoch: [12/20] step: [2100/4468], loss: 19.162342, lr: 0.000778, per step time: 216.393 ms, fps per card: 4.62 img/s
[2024-03-04 12:08:12] mindocr.utils.callbacks INFO - epoch: [12/20] step: [2200/4468], loss: 14.251347, lr: 0.000777, per step time: 223.371 ms, fps per card: 4.48 img/s
[2024-03-04 12:08:34] mindocr.utils.callbacks INFO - epoch: [12/20] step: [2300/4468], loss: 13.127800, lr: 0.000776, per step time: 220.196 ms, fps per card: 4.54 img/s
[2024-03-04 12:08:56] mindocr.utils.callbacks INFO - epoch: [12/20] step: [2400/4468], loss: 10.320167, lr: 0.000775, per step time: 220.654 ms, fps per card: 4.53 img/s
[2024-03-04 12:09:18] mindocr.utils.callbacks INFO - epoch: [12/20] step: [2500/4468], loss: 19.787119, lr: 0.000774, per step time: 215.162 ms, fps per card: 4.65 img/s
[2024-03-04 12:09:38] mindocr.utils.callbacks INFO - epoch: [12/20] step: [2600/4468], loss: 26.924673, lr: 0.000773, per step time: 208.475 ms, fps per card: 4.80 img/s
[2024-03-04 12:10:00] mindocr.utils.callbacks INFO - epoch: [12/20] step: [2700/4468], loss: 33.953568, lr: 0.000772, per step time: 221.263 ms, fps per card: 4.52 img/s
[2024-03-04 12:10:22] mindocr.utils.callbacks INFO - epoch: [12/20] step: [2800/4468], loss: 20.984064, lr: 0.000771, per step time: 217.007 ms, fps per card: 4.61 img/s
[2024-03-04 12:10:44] mindocr.utils.callbacks INFO - epoch: [12/20] step: [2900/4468], loss: 22.761543, lr: 0.000770, per step time: 215.950 ms, fps per card: 4.63 img/s
[2024-03-04 12:11:06] mindocr.utils.callbacks INFO - epoch: [12/20] step: [3000/4468], loss: 19.123419, lr: 0.000769, per step time: 223.888 ms, fps per card: 4.47 img/s
[2024-03-04 12:11:28] mindocr.utils.callbacks INFO - epoch: [12/20] step: [3100/4468], loss: 11.884511, lr: 0.000768, per step time: 218.575 ms, fps per card: 4.58 img/s
[2024-03-04 12:11:51] mindocr.utils.callbacks INFO - epoch: [12/20] step: [3200/4468], loss: 19.811079, lr: 0.000767, per step time: 227.719 ms, fps per card: 4.39 img/s
[2024-03-04 12:12:13] mindocr.utils.callbacks INFO - epoch: [12/20] step: [3300/4468], loss: 10.613132, lr: 0.000765, per step time: 219.258 ms, fps per card: 4.56 img/s
[2024-03-04 12:12:35] mindocr.utils.callbacks INFO - epoch: [12/20] step: [3400/4468], loss: 15.470774, lr: 0.000764, per step time: 221.877 ms, fps per card: 4.51 img/s
[2024-03-04 12:12:57] mindocr.utils.callbacks INFO - epoch: [12/20] step: [3500/4468], loss: 16.807127, lr: 0.000763, per step time: 219.412 ms, fps per card: 4.56 img/s
[2024-03-04 12:13:19] mindocr.utils.callbacks INFO - epoch: [12/20] step: [3600/4468], loss: 12.483809, lr: 0.000762, per step time: 220.050 ms, fps per card: 4.54 img/s
[2024-03-04 12:13:41] mindocr.utils.callbacks INFO - epoch: [12/20] step: [3700/4468], loss: 22.583733, lr: 0.000761, per step time: 221.429 ms, fps per card: 4.52 img/s
[2024-03-04 12:14:04] mindocr.utils.callbacks INFO - epoch: [12/20] step: [3800/4468], loss: 12.285607, lr: 0.000760, per step time: 224.991 ms, fps per card: 4.44 img/s
[2024-03-04 12:14:26] mindocr.utils.callbacks INFO - epoch: [12/20] step: [3900/4468], loss: 11.697602, lr: 0.000759, per step time: 224.615 ms, fps per card: 4.45 img/s
[2024-03-04 12:14:48] mindocr.utils.callbacks INFO - epoch: [12/20] step: [4000/4468], loss: 28.143490, lr: 0.000758, per step time: 224.418 ms, fps per card: 4.46 img/s
[2024-03-04 12:15:11] mindocr.utils.callbacks INFO - epoch: [12/20] step: [4100/4468], loss: 11.969640, lr: 0.000757, per step time: 225.497 ms, fps per card: 4.43 img/s
[2024-03-04 12:15:33] mindocr.utils.callbacks INFO - epoch: [12/20] step: [4200/4468], loss: 12.418633, lr: 0.000755, per step time: 217.643 ms, fps per card: 4.59 img/s
[2024-03-04 12:15:55] mindocr.utils.callbacks INFO - epoch: [12/20] step: [4300/4468], loss: 32.624569, lr: 0.000754, per step time: 221.953 ms, fps per card: 4.51 img/s
[2024-03-04 12:16:17] mindocr.utils.callbacks INFO - epoch: [12/20] step: [4400/4468], loss: 18.458952, lr: 0.000753, per step time: 217.663 ms, fps per card: 4.59 img/s
[2024-03-04 12:16:32] mindocr.utils.callbacks INFO - epoch: [12/20], loss: 18.180368, epoch time: 985.039 s, per step time: 220.465 ms, fps per card: 4.54 img/s
[2024-03-04 12:17:02] mindocr.metrics.rec_metrics INFO - correct num: 0, total num: 2077.0
[2024-03-04 12:17:02] mindocr.utils.callbacks INFO - Performance: {'acc': 0.0, 'norm_edit_distance': 0.0}, eval time: 29.979534149169922
[2024-03-04 12:17:26] mindocr.utils.callbacks INFO - epoch: [13/20] step: [100/4468], loss: 19.460188, lr: 0.000751, per step time: 218.662 ms, fps per card: 4.57 img/s
[2024-03-04 12:17:48] mindocr.utils.callbacks INFO - epoch: [13/20] step: [200/4468], loss: 18.364477, lr: 0.000750, per step time: 221.679 ms, fps per card: 4.51 img/s
[2024-03-04 12:18:10] mindocr.utils.callbacks INFO - epoch: [13/20] step: [300/4468], loss: 33.481869, lr: 0.000749, per step time: 217.873 ms, fps per card: 4.59 img/s
[2024-03-04 12:18:33] mindocr.utils.callbacks INFO - epoch: [13/20] step: [400/4468], loss: 10.340059, lr: 0.000748, per step time: 224.524 ms, fps per card: 4.45 img/s
[2024-03-04 12:18:55] mindocr.utils.callbacks INFO - epoch: [13/20] step: [500/4468], loss: 10.770986, lr: 0.000747, per step time: 223.765 ms, fps per card: 4.47 img/s
[2024-03-04 12:19:17] mindocr.utils.callbacks INFO - epoch: [13/20] step: [600/4468], loss: 23.117287, lr: 0.000746, per step time: 222.526 ms, fps per card: 4.49 img/s
[2024-03-04 12:19:40] mindocr.utils.callbacks INFO - epoch: [13/20] step: [700/4468], loss: 8.122672, lr: 0.000745, per step time: 225.589 ms, fps per card: 4.43 img/s
[2024-03-04 12:20:01] mindocr.utils.callbacks INFO - epoch: [13/20] step: [800/4468], loss: 23.849684, lr: 0.000744, per step time: 215.477 ms, fps per card: 4.64 img/s
[2024-03-04 12:20:23] mindocr.utils.callbacks INFO - epoch: [13/20] step: [900/4468], loss: 29.909067, lr: 0.000742, per step time: 218.545 ms, fps per card: 4.58 img/s
[2024-03-04 12:20:45] mindocr.utils.callbacks INFO - epoch: [13/20] step: [1000/4468], loss: 14.661881, lr: 0.000741, per step time: 222.147 ms, fps per card: 4.50 img/s
[2024-03-04 12:20:59] mindocr.data.transforms.rec_transforms WARNING - ... does not contain any valid character in the dictionary.
[2024-03-04 12:21:07] mindocr.utils.callbacks INFO - epoch: [13/20] step: [1100/4468], loss: 15.745966, lr: 0.000740, per step time: 219.763 ms, fps per card: 4.55 img/s
[2024-03-04 12:21:30] mindocr.utils.callbacks INFO - epoch: [13/20] step: [1200/4468], loss: 14.151919, lr: 0.000739, per step time: 224.644 ms, fps per card: 4.45 img/s
[2024-03-04 12:21:52] mindocr.utils.callbacks INFO - epoch: [13/20] step: [1300/4468], loss: 34.428429, lr: 0.000738, per step time: 223.545 ms, fps per card: 4.47 img/s
[2024-03-04 12:22:15] mindocr.utils.callbacks INFO - epoch: [13/20] step: [1400/4468], loss: 16.289040, lr: 0.000737, per step time: 228.190 ms, fps per card: 4.38 img/s
[2024-03-04 12:22:38] mindocr.utils.callbacks INFO - epoch: [13/20] step: [1500/4468], loss: 12.537392, lr: 0.000736, per step time: 225.212 ms, fps per card: 4.44 img/s
[2024-03-04 12:22:59] mindocr.utils.callbacks INFO - epoch: [13/20] step: [1600/4468], loss: 16.505657, lr: 0.000734, per step time: 215.951 ms, fps per card: 4.63 img/s
[2024-03-04 12:23:22] mindocr.utils.callbacks INFO - epoch: [13/20] step: [1700/4468], loss: 20.724134, lr: 0.000733, per step time: 226.152 ms, fps per card: 4.42 img/s
[2024-03-04 12:23:44] mindocr.utils.callbacks INFO - epoch: [13/20] step: [1800/4468], loss: 25.622068, lr: 0.000732, per step time: 224.102 ms, fps per card: 4.46 img/s
[2024-03-04 12:24:06] mindocr.utils.callbacks INFO - epoch: [13/20] step: [1900/4468], loss: 22.135950, lr: 0.000731, per step time: 217.309 ms, fps per card: 4.60 img/s
[2024-03-04 12:24:28] mindocr.utils.callbacks INFO - epoch: [13/20] step: [2000/4468], loss: 11.581539, lr: 0.000730, per step time: 223.159 ms, fps per card: 4.48 img/s
[2024-03-04 12:24:50] mindocr.utils.callbacks INFO - epoch: [13/20] step: [2100/4468], loss: 19.023918, lr: 0.000729, per step time: 216.699 ms, fps per card: 4.61 img/s
[2024-03-04 12:25:12] mindocr.utils.callbacks INFO - epoch: [13/20] step: [2200/4468], loss: 12.835676, lr: 0.000728, per step time: 219.905 ms, fps per card: 4.55 img/s
[2024-03-04 12:25:34] mindocr.utils.callbacks INFO - epoch: [13/20] step: [2300/4468], loss: 16.867731, lr: 0.000726, per step time: 224.488 ms, fps per card: 4.45 img/s
[2024-03-04 12:25:57] mindocr.utils.callbacks INFO - epoch: [13/20] step: [2400/4468], loss: 16.099321, lr: 0.000725, per step time: 225.679 ms, fps per card: 4.43 img/s
[2024-03-04 12:26:19] mindocr.utils.callbacks INFO - epoch: [13/20] step: [2500/4468], loss: 24.325077, lr: 0.000724, per step time: 221.878 ms, fps per card: 4.51 img/s
[2024-03-04 12:26:42] mindocr.utils.callbacks INFO - epoch: [13/20] step: [2600/4468], loss: 16.585316, lr: 0.000723, per step time: 228.631 ms, fps per card: 4.37 img/s
[2024-03-04 12:27:04] mindocr.utils.callbacks INFO - epoch: [13/20] step: [2700/4468], loss: 14.728975, lr: 0.000722, per step time: 224.185 ms, fps per card: 4.46 img/s
[2024-03-04 12:27:27] mindocr.utils.callbacks INFO - epoch: [13/20] step: [2800/4468], loss: 13.253246, lr: 0.000721, per step time: 221.407 ms, fps per card: 4.52 img/s
[2024-03-04 12:27:49] mindocr.utils.callbacks INFO - epoch: [13/20] step: [2900/4468], loss: 19.224873, lr: 0.000719, per step time: 223.921 ms, fps per card: 4.47 img/s
[2024-03-04 12:28:11] mindocr.utils.callbacks INFO - epoch: [13/20] step: [3000/4468], loss: 13.644338, lr: 0.000718, per step time: 224.865 ms, fps per card: 4.45 img/s
[2024-03-04 12:28:34] mindocr.utils.callbacks INFO - epoch: [13/20] step: [3100/4468], loss: 9.626924, lr: 0.000717, per step time: 224.088 ms, fps per card: 4.46 img/s
[2024-03-04 12:28:56] mindocr.utils.callbacks INFO - epoch: [13/20] step: [3200/4468], loss: 24.057007, lr: 0.000716, per step time: 223.981 ms, fps per card: 4.46 img/s
[2024-03-04 12:29:18] mindocr.utils.callbacks INFO - epoch: [13/20] step: [3300/4468], loss: 48.626820, lr: 0.000715, per step time: 218.554 ms, fps per card: 4.58 img/s
[2024-03-04 12:29:40] mindocr.utils.callbacks INFO - epoch: [13/20] step: [3400/4468], loss: 32.099522, lr: 0.000714, per step time: 222.986 ms, fps per card: 4.48 img/s
[2024-03-04 12:30:03] mindocr.utils.callbacks INFO - epoch: [13/20] step: [3500/4468], loss: 9.914026, lr: 0.000712, per step time: 224.564 ms, fps per card: 4.45 img/s
[2024-03-04 12:30:26] mindocr.utils.callbacks INFO - epoch: [13/20] step: [3600/4468], loss: 16.248941, lr: 0.000711, per step time: 229.105 ms, fps per card: 4.36 img/s
[2024-03-04 12:30:49] mindocr.utils.callbacks INFO - epoch: [13/20] step: [3700/4468], loss: 16.353361, lr: 0.000710, per step time: 227.059 ms, fps per card: 4.40 img/s
[2024-03-04 12:31:11] mindocr.utils.callbacks INFO - epoch: [13/20] step: [3800/4468], loss: 12.071093, lr: 0.000709, per step time: 224.703 ms, fps per card: 4.45 img/s
[2024-03-04 12:31:34] mindocr.utils.callbacks INFO - epoch: [13/20] step: [3900/4468], loss: 15.630691, lr: 0.000708, per step time: 226.944 ms, fps per card: 4.41 img/s
[2024-03-04 12:31:57] mindocr.utils.callbacks INFO - epoch: [13/20] step: [4000/4468], loss: 29.918159, lr: 0.000707, per step time: 228.035 ms, fps per card: 4.39 img/s
[2024-03-04 12:32:19] mindocr.utils.callbacks INFO - epoch: [13/20] step: [4100/4468], loss: 15.904134, lr: 0.000705, per step time: 225.353 ms, fps per card: 4.44 img/s
[2024-03-04 12:32:41] mindocr.utils.callbacks INFO - epoch: [13/20] step: [4200/4468], loss: 16.739155, lr: 0.000704, per step time: 220.932 ms, fps per card: 4.53 img/s
[2024-03-04 12:33:03] mindocr.utils.callbacks INFO - epoch: [13/20] step: [4300/4468], loss: 13.423388, lr: 0.000703, per step time: 222.888 ms, fps per card: 4.49 img/s
[2024-03-04 12:33:26] mindocr.utils.callbacks INFO - epoch: [13/20] step: [4400/4468], loss: 38.993061, lr: 0.000702, per step time: 229.725 ms, fps per card: 4.35 img/s
[2024-03-04 12:33:42] mindocr.utils.callbacks INFO - epoch: [13/20], loss: 18.179216, epoch time: 997.331 s, per step time: 223.216 ms, fps per card: 4.48 img/s
[2024-03-04 12:34:12] mindocr.metrics.rec_metrics INFO - correct num: 0, total num: 2077.0
[2024-03-04 12:34:12] mindocr.utils.callbacks INFO - Performance: {'acc': 0.0, 'norm_edit_distance': 0.0}, eval time: 30.48786497116089

Which dataset did you use to train svtr? We used O0 mode and trained on lmdb dataset, it could work normally:

[2024-03-05 16:12:36] mindocr.utils.callbacks INFO - epoch: [2/30] step: [6300/7051], loss: 1.515610, lr: 0.000631, per step time: 730.121 ms, fps per card: 701.25 img/s
[2024-03-05 16:13:49] mindocr.utils.callbacks INFO - epoch: [2/30] step: [6400/7051], loss: 1.646477, lr: 0.000636, per step time: 728.797 ms, fps per card: 702.53 img/s
[2024-03-05 16:15:02] mindocr.utils.callbacks INFO - epoch: [2/30] step: [6500/7051], loss: 1.364452, lr: 0.000641, per step time: 729.646 ms, fps per card: 701.71 img/s
[2024-03-05 16:16:14] mindocr.utils.callbacks INFO - epoch: [2/30] step: [6600/7051], loss: 1.723172, lr: 0.000645, per step time: 727.210 ms, fps per card: 704.06 img/s
[2024-03-05 16:17:27] mindocr.utils.callbacks INFO - epoch: [2/30] step: [6700/7051], loss: 1.556459, lr: 0.000650, per step time: 729.373 ms, fps per card: 701.97 img/s
[2024-03-05 16:18:40] mindocr.utils.callbacks INFO - epoch: [2/30] step: [6800/7051], loss: 1.578286, lr: 0.000655, per step time: 724.420 ms, fps per card: 706.77 img/s
[2024-03-05 16:19:53] mindocr.utils.callbacks INFO - epoch: [2/30] step: [6900/7051], loss: 1.468486, lr: 0.000659, per step time: 730.197 ms, fps per card: 701.18 img/s
[2024-03-05 16:21:06] mindocr.utils.callbacks INFO - epoch: [2/30] step: [7000/7051], loss: 1.359129, lr: 0.000664, per step time: 734.849 ms, fps per card: 696.74 img/s
[2024-03-05 16:21:45] mindocr.utils.callbacks INFO - epoch: [2/30], loss: 1.755707, epoch time: 5153.177 s, per step time: 730.843 ms, fps per card: 700.56 img/s
100%|██████████| 4/4 [00:01<00:00,  2.83it/s]
100%|██████████| 4/4 [00:01<00:00,  2.68it/s]
100%|██████████| 4/4 [00:01<00:00,  2.78it/s]
[2024-03-05 16:21:46] mindocr.metrics.rec_metrics INFO - correct num: 1411, total num: 1748.0
100%|██████████| 4/4 [00:01<00:00,  2.67it/s]
[2024-03-05 16:21:46] mindocr.utils.callbacks INFO - Performance: {'acc': 0.8160755038261414, 'norm_edit_distance': 0.9285569190979004}, eval time: 1.7574305534362793
[2024-03-05 16:21:47] mindocr.utils.callbacks INFO - => Best acc: 0.8160755038261414, checkpoint saved.

And how about the configuration of svtr_tiny.yaml, could you show us?

Hello,My configuration content is as follows:

system:
mode: 0 # 0 for graph mode, 1 for pynative mode in MindSpore
distribute: False
amp_level: O0
amp_level_infer: O0 # running inference in O2 mode
seed: 42
log_interval: 100
val_while_train: True
drop_overflow_update: True
ckpt_save_policy: latest_k
ckpt_max_keep: 5

common:
character_dict_path: &character_dict_path
num_classes: &num_classes 37 # num_chars_in_dict + 1
max_text_len: &max_text_len 24
use_space_char: &use_space_char False
batch_size: &batch_size 1
num_workers: &num_workers 1
num_epochs: &num_epochs 1
dataset_root: &dataset_root ./dataset_ic15/rec
ckpt_save_dir: &ckpt_save_dir ./tmp_rec
ckpt_load_path: &ckpt_load_path ./tmp_rec/best.ckpt
resume: &resume False

model:
type: rec
transform:
name: STN_ON
in_channels: 3
tps_inputsize: [32, 64]
tps_outputsize: [32, 100]
num_control_points: 20
tps_margins: [0.05, 0.05]
stn_activation: none
backbone:
name: SVTRNet
pretrained: False
img_size: [32, 100]
out_channels: 192
patch_merging: Conv
embed_dim: [64, 128, 256]
depth: [3, 6, 3]
num_heads: [2, 4, 8]
mixer:
[
"Local",
"Local",
"Local",
"Local",
"Local",
"Local",
"Global",
"Global",
"Global",
"Global",
"Global",
"Global",
]
local_mixer: [[7, 11], [7, 11], [7, 11]]
last_stage: True
prenorm: False
neck:
name: Img2Seq
head:
name: CTCHead
out_channels: *num_classes
resume: *resume

postprocess:
name: RecCTCLabelDecode
character_dict_path: *character_dict_path
use_space_char: *use_space_char

metric:
name: RecMetric
main_indicator: acc
character_dict_path: *character_dict_path
ignore_space: True
print_flag: False

loss:
name: CTCLoss
pred_seq_len: 25 # 100 / 4
max_label_len: *max_text_len # this value should be smaller than pre_seq_len
batch_size: *batch_size

scheduler:
scheduler: warmup_cosine_decay
min_lr: 0.00001
lr: 0.001
num_epochs: *num_epochs
warmup_epochs: 3
decay_epochs: 27
optimizer:
opt: adamw
grouping_strategy: svtr
filter_bias_and_bn: False
weight_decay: 0.05

loss_scaler:
type: dynamic
loss_scale: 512
scale_factor: 2.0
scale_window: 1000

train:
ckpt_save_dir: *ckpt_save_dir
dataset_sink_mode: False
ema: True
ema_decay: 0.9999
dataset:
type: RecDataset
dataset_root: *dataset_root
data_dir: train
label_file: train_rec_gt.txt
sample_ratio: 1.0
shuffle: True
filter_max_len: True
filter_zero_text_image: True
extra_count_if_repeat: True
max_text_len: *max_text_len
character_dict_path: *character_dict_path
label_standandize: True
transform_pipeline:
- DecodeImage:
img_mode: BGR
to_float32: False
- SVTRRecAug:
aug_type: 0
- RecCTCLabelEncode:
max_text_len: *max_text_len
character_dict_path: *character_dict_path
use_space_char: *use_space_char
lower: True
- SVTRRecResizeImg:
image_shape: [64, 256]
padding: False
- NormalizeImage:
bgr_to_rgb: True
is_hwc: True
mean: [127.0, 127.0, 127.0]
std: [127.0, 127.0, 127.0]
- ToCHWImage:
output_columns: ["image", "text_seq"]
net_input_column_index: [0]
label_column_index: [1]

loader:
shuffle: True
batch_size: *batch_size
drop_remainder: True
max_rowsize: 12
num_workers: 4

eval:
ckpt_load_path: *ckpt_load_path
dataset_sink_mode: False
dataset:
type: RecDataset
dataset_root: *dataset_root
data_dir: val
label_file: val_rec_gt.txt
sample_ratio: 1.0
shuffle: False
transform_pipeline:
- DecodeImage:
img_mode: BGR
to_float32: False
- RecCTCLabelEncode:
max_text_len: *max_text_len
character_dict_path: *character_dict_path
use_space_char: *use_space_char
lower: True
- SVTRRecResizeImg:
image_shape: [64, 256]
padding: False
- NormalizeImage:
bgr_to_rgb: True
is_hwc: True
mean: [127.0, 127.0, 127.0]
std: [127.0, 127.0, 127.0]
- ToCHWImage:
output_columns: ["image", "text_padded", "text_length"]
net_input_column_index: [0]
label_column_index: [1, 2]

loader:
shuffle: False
batch_size: 1
drop_remainder: False
max_rowsize: 12
num_workers: 1

Training with batch_size 1 might cause unsteady training, and larger batch_size, like setting it to 16, might help.

Sry, when I set the batchsize to 16, the following error will occur:
ValueError: For 'CTCLoss', the first dim of 'label_indices' and 'label_value' must be same, but got 'label_indices':24, 'label_value': 384.
Is the batchsize immutable?

The svtr_tiny.yaml need some modification for ic15 dataset. Please try to set max_label_len to 384.

@panshaowu @horcham. Thanks. The problem has been solved. The problem has been solved, and if a pre trained model is used, similar problems will not occur.