Japanese image ocr errors

Question

Japanese image ocr errors

sasumay opened this issue 2 years ago · comments

sasumay commented 2 years ago

Please provide the following information to quickly locate the problem

System Environment：ubuntu 20.0.4
-Version：Paddle：2.3.0 PaddleOCR： 2.5.0.3
Command Code：
import cv2
from paddleocr import PaddleOCR
image = cv2.imread(img_path)
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
ocr = PaddleOCR(lang=lang, use_angle_cls=False)
result = ocr.ocr(thresh, det=True, cls=False)
Complete Error Message：
Japanese ocr is seen to have the following issues
1. Square brackets are not recognized,
2. 0s are recognized as Os
3. Case is different.
  Original tiff file is here - https://github.com/sasumay/paddle-test/blob/main/jpn_temp.tif?raw=true

Please see screen shot below of issue below comparing image and output

xiaoting · Answer 1 · Thu Jun 30 2022 17:30:49 GMT+0800 (China Standard Time)

Thanks for the feedback case, the Japanese recognition model is still being optimized.
PaddleOCR looks forward to the next version to solve your problem.

Le Viet Hung (Jack Ryan) · Answer 2 · Wed Aug 30 2023 18:05:11 GMT+0800 (China Standard Time)

@tink2123 Hello, Could you please share some information about:

Dataset is trained for Japanese language?
how to train the model ?
Thank you so much