pd3f / pd3f

🏭 PDF text extraction pipeline: self-hosted, local-first, Docker-based

Home Page:https://pd3f.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Add Russian language recognition

nashtash opened this issue · comments

Tesseract supports Russian, but afaik, there is no Russian language model for Flair. So I guess this may take while until pd3f can support Russian. :/

Must be used Flair for this? Tesseract has a russian or romanian language (my case). Can be used that for getting special characters?