Japanese image ocr errors
sasumay opened this issue · comments
Please provide the following information to quickly locate the problem
-
System Environment:ubuntu 20.0.4
-Version:Paddle:2.3.0 PaddleOCR: 2.5.0.3 -
Command Code:
import cv2
from paddleocr import PaddleOCR
image = cv2.imread(img_path)
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
ocr = PaddleOCR(lang=lang, use_angle_cls=False)
result = ocr.ocr(thresh, det=True, cls=False) -
Complete Error Message:
Japanese ocr is seen to have the following issues- Square brackets are not recognized,
- 0s are recognized as Os
- Case is different.
Original tiff file is here - https://github.com/sasumay/paddle-test/blob/main/jpn_temp.tif?raw=true
Please see screen shot below of issue below comparing image and output
Thanks for the feedback case, the Japanese recognition model is still being optimized.
PaddleOCR looks forward to the next version to solve your problem.
@tink2123 Hello, Could you please share some information about:
- Dataset is trained for Japanese language?
- how to train the model ?
Thank you so much