tesseract-ocr / tesseract

Tesseract Open Source OCR Engine (main repository)

Home Page:https://tesseract-ocr.github.io/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Ocr's some numbers as text strings

allhavebrainimplantsandmore opened this issue · comments

Current Behavior

I know there are settings to tweak and things you play with but it's so strange that only a tiny fraction of numbers get recognized as text strings by default. Is there any way to report and collect what tesseract's default recognition misses so it can update the engine to improve it? I'm literally seeing some patterns in what tesseract misses and maybe it can be updated and improved so it works better and more flawless out of the box?

Expected Behavior

No response

Suggested Fix

A way to report misses in default tesseract behavior to update its recognition engine for all to benefit.

tesseract -v

tesseract 5.3.2

Operating System

No response

Other Operating System

Fedora 39

uname -a

No response

Compiler

No response

CPU

No response

Virtualization / Containers

No response

Other Information

No response

Closing as the reporter does not provide anything that we can reproduce.