nguyenq / tess4j

Java JNA wrapper for Tesseract OCR API

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

tesseract-ocr and tess result are different

zymgg opened this issue · comments

lanpic.zip

system: windows10+jdk17+idea
tesseract-ocr:v5.3.3.20231005
tess4j:5.10.0
Successfully trained a new language using LSTM in windows
windows:
tesseract image27a.jpg output_2 -l num
result:JT5246870293852
java:

        ITesseract instance = new Tesseract();
        instance.setDatapath("D:\\Tesseract-OCR\\tessdata");
        instance.setLanguage("num");
        String logistics = instance.doOCR(new File("F:\\2024-01\\18090241\\image2\\image27a.jpg"));
        System.out.println(logistics );

result:
eee al
18H AAR** = 4 8}

System.setProperty("jna.library.path", "D:\Tesseract-OCR\");
this code debug can see, but result is always different

@zymgg We confirm your findings. When we loaded your image in VietOCR3, which uses Tess4J library, we got good results using either eng or num pack. You may want to step through VietOCR3's code execution for your investigation.

@zymgg We confirm your findings. When we loaded your image in VietOCR3, which uses Tess4J library, we got good results using either eng or num pack. You may want to step through VietOCR3's code execution for your investigation.

thanks!!! i see source code find setPageSegMode default 3. Thank you for your help