clovaai / donut

Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator (SynthDoG), ECCV 2022

Home Page:https://arxiv.org/abs/2111.15664

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Integrate a customized internal OCR engine to Donut

Altimis opened this issue · comments

Hello guys. Thank you so much for this brilliant Model.
I'm aware that Donut is an OCR-free model which does not rely on an OCR input. When I performed some tests (fine-tuning the model), I realized that the internal OCR-engine performance is not as good as Google Cloud Vision OCR. Is is possible to change the OCR engine by this one ? Thanks you !

Donut is not made to compete with OCR engines, it is pre-trained on generating OCR to give it a general understanding about characters and language that can be leveraged in fine tuning tasks, like extracting a specific information from an input image. If you want good OCR, I would recommend sticking to tesseract or cloud solutions like the one you suggested.