TIFF to PDF (text_only==false) recognition (or conversion) failed.

Question

TIFF to PDF (text_only==false) recognition (or conversion) failed.

NicolasFelix opened this issue 2 years ago · comments

Hi,
first of all, I thank you for this great project.

I am facing an issue when asking direct TIFF image recognition
with PDF output (image + text, I mean text-only attribut set to false),
generated PDF is then corrupted.

This issue can be reproduced using tess4j unit tests, by running method testResultRenderer

Note: if 3rd attribute from TessAPI1.TessPDFRendererCreate(outputbase, dataPath, FALSE) is set to TRUE, PDF is then generated (but, as expected, without source image)

If you think this issue should be declared into tesseract project, let me know, I'll then try my best to pull up this issue to their project ;)

Thx, Nicolas

Quan Nguyen · Answer 1 · Sun Jan 29 2023 06:00:37 GMT+0800 (China Standard Time)

We confirm the bug and are investigating. Will let you know of the results.

Thanks.

Quan Nguyen · Answer 2 · Mon Jan 30 2023 02:34:11 GMT+0800 (China Standard Time)

It appears to be a bug in Leptonica 1.83.0. It has been fixed in 1.83.1. We'll soon make a release to incorporate the fix.

DanBloomberg/leptonica@544561a

NicolasFelix · Answer 3 · Mon Jan 30 2023 17:24:09 GMT+0800 (China Standard Time)

I thank you again, great work ;)