PaddlePaddle / PaddleOCR

Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Japanese image ocr errors

sasumay opened this issue · comments

Please provide the following information to quickly locate the problem

  • System Environment:ubuntu 20.0.4
    -Version:Paddle:2.3.0 PaddleOCR: 2.5.0.3

  • Command Code:
    import cv2
    from paddleocr import PaddleOCR
    image = cv2.imread(img_path)
    gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
    ocr = PaddleOCR(lang=lang, use_angle_cls=False)
    result = ocr.ocr(thresh, det=True, cls=False)

  • Complete Error Message:
    Japanese ocr is seen to have the following issues

    1. Square brackets are not recognized,
    2. 0s are recognized as Os
    3. Case is different.
      Original tiff file is here - https://github.com/sasumay/paddle-test/blob/main/jpn_temp.tif?raw=true

Please see screen shot below of issue below comparing image and output
jpn_err

Thanks for the feedback case, the Japanese recognition model is still being optimized.
PaddleOCR looks forward to the next version to solve your problem.

@tink2123 Hello, Could you please share some information about:

  • Dataset is trained for Japanese language?
  • how to train the model ?
    Thank you so much