tesseract-ocr and tess result are different

Question

tesseract-ocr and tess result are different

zymgg opened this issue 7 months ago · comments

system: windows10+jdk17+idea
tesseract-ocr:v5.3.3.20231005
tess4j:5.10.0
Successfully trained a new language using LSTM in windows
windows:
tesseract image27a.jpg output_2 -l num
result:JT5246870293852
java:

        ITesseract instance = new Tesseract();
        instance.setDatapath("D:\\Tesseract-OCR\\tessdata");
        instance.setLanguage("num");
        String logistics = instance.doOCR(new File("F:\\2024-01\\18090241\\image2\\image27a.jpg"));
        System.out.println(logistics );

result:
eee al
18H AAR** = 4 8}

System.setProperty("jna.library.path", "D:\Tesseract-OCR\");
this code debug can see, but result is always different

Quan Nguyen · Answer 1 · Thu Jan 25 2024 13:09:34 GMT+0800 (China Standard Time)

@zymgg We confirm your findings. When we loaded your image in VietOCR3, which uses Tess4J library, we got good results using either eng or num pack. You may want to step through VietOCR3's code execution for your investigation.

zymgg · Answer 2 · Thu Jan 25 2024 14:13:46 GMT+0800 (China Standard Time)

@zymgg We confirm your findings. When we loaded your image in VietOCR3, which uses Tess4J library, we got good results using either eng or num pack. You may want to step through VietOCR3's code execution for your investigation.

thanks!!! i see source code find setPageSegMode default 3. Thank you for your help