Ocr's some numbers as text strings

Question

Ocr's some numbers as text strings

allhavebrainimplantsandmore opened this issue 5 months ago · comments

allhavebrainimplantsandmore commented 5 months ago

Current Behavior

I know there are settings to tweak and things you play with but it's so strange that only a tiny fraction of numbers get recognized as text strings by default. Is there any way to report and collect what tesseract's default recognition misses so it can update the engine to improve it? I'm literally seeing some patterns in what tesseract misses and maybe it can be updated and improved so it works better and more flawless out of the box?

Expected Behavior

No response

Suggested Fix

A way to report misses in default tesseract behavior to update its recognition engine for all to benefit.

tesseract -v

tesseract 5.3.2

Operating System

No response

Other Operating System

Fedora 39

uname -a

No response

Compiler

No response

CPU

No response

Virtualization / Containers

No response

Other Information

No response

zdenop · Answer 1 · Sat Jan 13 2024 17:57:20 GMT+0800 (China Standard Time)

Closing as the reporter does not provide anything that we can reproduce.