Calamari-OCR / calamari

Line based ATR Engine based on OCRopy

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Argument "val.preload" documented but not known

wrznr opened this issue · comments

According to https://calamari-ocr.readthedocs.io/en/latest/doc.command-line-usage.html#preloading-data-load-data-on-the-fly there is an argument val.preload used to prevent validation images from being loaded into RAM. However, applying this argument leads to an UnknownArgumentError:

$ calamari-train --train.preload False --val.preload False --trainer.gen SplitTrain --trainer.gen.validation_split_ratio=0.2 --train PageXML --train.images *.tif
2022-03-24 14:16:12.989906: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-03-24 14:16:12.989929: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
INFO     2022-03-24 14:16:14,206 calamari_ocr.ocr.training.pipe: Splitting training and validation files with ratio 0.2: 89/358 for validation/training.
CRITICAL 2022-03-24 14:16:14,231             tfaip.util.logging: Uncaught exception
Traceback (most recent call last):
  File "/home/kmw/Documents/Work/OCR-D/ocrd_all/venv/bin/calamari-train", line 8, in <module>
    sys.exit(run())
  File "/home/kmw/Documents/Work/OCR-D/ocrd_all/venv/lib/python3.8/site-packages/calamari_ocr/scripts/train.py", line 17, in run
    main(parse_args())
  File "/home/kmw/Documents/Work/OCR-D/ocrd_all/venv/lib/python3.8/site-packages/calamari_ocr/scripts/train.py", line 40, in parse_args
    params = parser.parse_args(args).trainer
  File "/home/kmw/Documents/Work/OCR-D/ocrd_all/venv/lib/python3.8/site-packages/paiargparse/main_parser.py", line 93, in parse_args
    raise UnknownArgumentError(f"Unknown Arguments {' '.join(argv)}. Possible alternatives:{''.join(help_str)}")
paiargparse.dataclass_parser.UnknownArgumentError: Unknown Arguments --val.preload False. Possible alternatives:
	--val.preload ==> --train.preload, --val.prefetch, --val.limit

You're right, that is confusing, the documentation needs some clarification. I'm guessing from the code that --val.preload only works when you use a separate validation dataset via --val, --val.images and so on. In your case, the validation dataset is derived directly from the --train-dataset with SplitTrain, so the parameter could probably be --trainer.gen.val.preload (would need to test if this has any effect).