breezedeus / CnOCR

CnOCR: Awesome Chinese/English OCR Python toolkits based on PyTorch. It comes with 20+ well-trained models for different application scenarios and can be used directly after installation. 【基于 PyTorch/MXNet 的中文/英文 OCR Python 包。】

Home Page:https://www.breezedeus.com/article/cnocr

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

issue when I trained CnOcr model on custom dataset

sif-boudjellal opened this issue · comments

Hello CnOCR community,

I got the following issue when I trained CnOcr model on MRZ dataset:

! cnocr train -m densenet_lite_136-gru --index-dir /content --train-config-fp /content/train_config.json

The Error:

  return [vocab[letter] for letter in input_string]
KeyError: '6400008F9611206JDR<<<<<<<<<<<1'
Traceback (most recent call last):
  File "/usr/local/bin/cnocr", line 8, in <module>
    sys.exit(cli())
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/cnocr/cli.py", line 180, in train
    trainer.fit(
  File "/usr/local/lib/python3.10/dist-packages/cnocr/trainer.py", line 338, in fit
    self.pl_trainer.fit(
  File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/trainer.py", line 544, in fit
    call._call_and_handle_interrupt(
  File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/call.py", line 44, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/trainer.py", line 580, in _fit_impl
    self._run(model, ckpt_path=ckpt_path)
  File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/trainer.py", line 989, in _run
    results = self._run_stage()
  File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/trainer.py", line 1033, in _run_stage
    self._run_sanity_check()
  File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/trainer.py", line 1062, in _run_sanity_check
    val_loop.run()
  File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/loops/utilities.py", line 182, in _decorator
    return loop_run(self, *args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/loops/evaluation_loop.py", line 134, in run
    self._evaluation_step(batch, batch_idx, dataloader_idx, dataloader_iter)
  File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/loops/evaluation_loop.py", line 391, in _evaluation_step
    output = call._call_strategy_hook(trainer, hook_name, *step_args)
  File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/call.py", line 309, in _call_strategy_hook
    output = fn(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/strategies/strategy.py", line 403, in validation_step
    return self.lightning_module.validation_step(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/cnocr/trainer.py", line 218, in validation_step
    res = self.model.calculate_loss(
  File "/usr/local/lib/python3.10/dist-packages/cnocr/models/ocr_model.py", line 174, in calculate_loss
    return self(
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/cnocr/models/ocr_model.py", line 232, in forward
    out['loss'] = self._compute_loss(logits, target, input_lengths)
  File "/usr/local/lib/python3.10/dist-packages/cnocr/models/ocr_model.py", line 295, in _compute_loss
    gt, seq_len = self.compute_target(target)
  File "/usr/local/lib/python3.10/dist-packages/cnocr/models/ocr_model.py", line 328, in compute_target
    encoded = encode_sequences(
  File "/usr/local/lib/python3.10/dist-packages/cnocr/data_utils/utils.py", line 106, in encode_sequences
    encoded_seq = encode_sequence(seq, vocab)
  File "/usr/local/lib/python3.10/dist-packages/cnocr/data_utils/utils.py", line 39, in encode_sequence
    return [vocab[letter] for letter in input_string]
  File "/usr/local/lib/python3.10/dist-packages/cnocr/data_utils/utils.py", line 39, in <listcomp>
    return [vocab[letter] for letter in input_string]
KeyError: '6400008F9611206JDR<<<<<<<<<<<1'

how can I solve it !

KeyError: '6400008F9611206JDR<<<<<<<<<<<1'
There is a problem with the format of your data file. Refer to https://github.com/breezedeus/CnOCR/blob/master/data/test/train.tsv , each character in the labels is separated by a space, and the file name is separated by from the labels that follow it.