Use TextRecognitionDataGenerator
DesBw opened this issue · comments
Your Feature Request
The images generated by https://github.com/Belval/TextRecognitionDataGenerator appear much more realistic to most real-world images than the images generated by text2image script.
It would be nice if Tesseract can use or support TextRecognitionDataGenerator.
Also this: Belval/TextRecognitionDataGenerator#153 shows that tool supports the tesseract format.
Or, if someone has come up with a way to use the two together, that would be nice. TextRecognitionDataGenerator supports advanced distortion and background choice. A model trained with those images could be more accurate.
Not sure what exactly the is problem: AFAIK text2image is only one of the tools that can be used for tesseract training. It is up to the trainer how the image and ground truth files are created/generated.