issue when I trained CnOcr model on custom dataset
sif-boudjellal opened this issue · comments
Hello CnOCR community,
I got the following issue when I trained CnOcr model on MRZ dataset:
! cnocr train -m densenet_lite_136-gru --index-dir /content --train-config-fp /content/train_config.json
The Error:
return [vocab[letter] for letter in input_string]
KeyError: '6400008F9611206JDR<<<<<<<<<<<1'
Traceback (most recent call last):
File "/usr/local/bin/cnocr", line 8, in <module>
sys.exit(cli())
File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1157, in __call__
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/cnocr/cli.py", line 180, in train
trainer.fit(
File "/usr/local/lib/python3.10/dist-packages/cnocr/trainer.py", line 338, in fit
self.pl_trainer.fit(
File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/trainer.py", line 544, in fit
call._call_and_handle_interrupt(
File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/call.py", line 44, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/trainer.py", line 580, in _fit_impl
self._run(model, ckpt_path=ckpt_path)
File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/trainer.py", line 989, in _run
results = self._run_stage()
File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/trainer.py", line 1033, in _run_stage
self._run_sanity_check()
File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/trainer.py", line 1062, in _run_sanity_check
val_loop.run()
File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/loops/utilities.py", line 182, in _decorator
return loop_run(self, *args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/loops/evaluation_loop.py", line 134, in run
self._evaluation_step(batch, batch_idx, dataloader_idx, dataloader_iter)
File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/loops/evaluation_loop.py", line 391, in _evaluation_step
output = call._call_strategy_hook(trainer, hook_name, *step_args)
File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/call.py", line 309, in _call_strategy_hook
output = fn(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/strategies/strategy.py", line 403, in validation_step
return self.lightning_module.validation_step(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/cnocr/trainer.py", line 218, in validation_step
res = self.model.calculate_loss(
File "/usr/local/lib/python3.10/dist-packages/cnocr/models/ocr_model.py", line 174, in calculate_loss
return self(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/cnocr/models/ocr_model.py", line 232, in forward
out['loss'] = self._compute_loss(logits, target, input_lengths)
File "/usr/local/lib/python3.10/dist-packages/cnocr/models/ocr_model.py", line 295, in _compute_loss
gt, seq_len = self.compute_target(target)
File "/usr/local/lib/python3.10/dist-packages/cnocr/models/ocr_model.py", line 328, in compute_target
encoded = encode_sequences(
File "/usr/local/lib/python3.10/dist-packages/cnocr/data_utils/utils.py", line 106, in encode_sequences
encoded_seq = encode_sequence(seq, vocab)
File "/usr/local/lib/python3.10/dist-packages/cnocr/data_utils/utils.py", line 39, in encode_sequence
return [vocab[letter] for letter in input_string]
File "/usr/local/lib/python3.10/dist-packages/cnocr/data_utils/utils.py", line 39, in <listcomp>
return [vocab[letter] for letter in input_string]
KeyError: '6400008F9611206JDR<<<<<<<<<<<1'
how can I solve it !
KeyError: '6400008F9611206JDR<<<<<<<<<<<1'
There is a problem with the format of your data file. Refer to https://github.com/breezedeus/CnOCR/blob/master/data/test/train.tsv , each character in the labels is separated by a space, and the file name is separated by from the labels that follow it.