nguyenq / tess4j

Java JNA wrapper for Tesseract OCR API

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

TIFF to PDF (text_only==false) recognition (or conversion) failed.

NicolasFelix opened this issue · comments

Hi,
first of all, I thank you for this great project.

I am facing an issue when asking direct TIFF image recognition
with PDF output (image + text, I mean text-only attribut set to false),
generated PDF is then corrupted.

This issue can be reproduced using tess4j unit tests, by running method testResultRenderer

Note: if 3rd attribute from TessAPI1.TessPDFRendererCreate(outputbase, dataPath, FALSE) is set to TRUE, PDF is then generated (but, as expected, without source image)

If you think this issue should be declared into tesseract project, let me know, I'll then try my best to pull up this issue to their project ;)

Thx, Nicolas

We confirm the bug and are investigating. Will let you know of the results.

Thanks.

It appears to be a bug in Leptonica 1.83.0. It has been fixed in 1.83.1. We'll soon make a release to incorporate the fix.

DanBloomberg/leptonica@544561a

I thank you again, great work ;)