Model very sensitive on PNG input

Question

Model very sensitive on PNG input

junyizhao04 opened this issue a year ago · comments

I tried multiple size and source (photo from screen, paper, screenshot etc) and attempted to run it in Calamari OCR using the given model. However, the model is very sensitive towards input and only around 5% works. What is the expected PNG input size?

Andreas Büttner · Answer 1 · Tue Mar 07 2023 00:29:22 GMT+0800 (China Standard Time)

I don't think the input size matters that much. Please note that calamari is line based, it does not include code or models for segmentation tasks. As long as your images is cropped nicely around a single line or you provide coordinates via PAGE XML, it should work.

Junyi Zhao · Answer 2 · Tue Mar 07 2023 01:04:46 GMT+0800 (China Standard Time)

I don't think the input size matters that much. Please note that calamari is line based, it does not include code or models for segmentation tasks. As long as your images is cropped nicely around a single line or you provide coordinates via PAGE XML, it should work.

then I believe image like this may work, but it does not.

Andreas Büttner · Answer 3 · Tue Mar 07 2023 01:33:17 GMT+0800 (China Standard Time)

The uw3 dataset the model uw3-modern-english was trained on contains only binarised data, therefore the model struggles with colour or grayscale images. If you convert your image to monochrome, e.g. via convert online.png -monochrome online.bin.png, it's recognised perfectly.