sajari / docconv

Converts PDF, DOC, DOCX, XML, HTML, RTF, etc to plain text

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

The content is disordered when converting a PDF file.

tosone opened this issue · comments

Here, should be fixed.
https://github.com/sajari/docconv/blob/master/pdf_ocr.go#L75C2-L92C3

We should also set a concurrency number because many images will make app OOM.