Calamari-OCR / calamari

Line based ATR Engine based on OCRopy

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Average sentencing confidence 0.00%

sanjeebsarkar opened this issue · comments

Hi ,

Thank you for building this application. I am not sure what mistake I am making here, I have installed cuda 11.4, and cudnn 8.2.4 both 64 bit, I have tensorflow 2.7.0 with python 3.

This the cmd I am running along with the output :

cmd -
calamari-predict --checkpoint D:\maventic_files\calamari_ocr\calamari_models-1.0\antiqua_modern\2.ckpt --files ginger.png
output -
Found 1 files in the dataset
Checkpoint version 2 is up-to-date.
2021-11-14 01:43:36.274191: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-11-14 01:43:40.860281: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 2147 MB memory: -> device: 0, name: NVIDIA GeForce GTX 1650, pci bus id: 0000:01:00.0, compute capability: 7.5
Model: "model"


Layer (type) Output Shape Param # Connected to

input_data (InputLayer) [(None, None, 48, 1 0 []
)]

conv2d_0 (Conv2D) (None, None, 48, 40 400 ['input_data[0][0]']
)

pool2d_1 (MaxPooling2D) (None, None, 24, 40 0 ['conv2d_0[0][0]']
)

conv2d_1 (Conv2D) (None, None, 24, 60 21660 ['pool2d_1[0][0]']
)

pool2d_3 (MaxPooling2D) (None, None, 12, 60 0 ['conv2d_1[0][0]']
)

reshape (Reshape) (None, None, 720) 0 ['pool2d_3[0][0]']

bidirectional (Bidirectional) (None, None, 400) 1473600 ['reshape[0][0]']

input_sequence_length (InputLa [(None, 1)] 0 []
yer)

dropout (Dropout) (None, None, 400) 0 ['bidirectional[0][0]']

tf.compat.v1.floor_div (TFOpLa (None, 1) 0 ['input_sequence_length[0][0]']
mbda)

logits (Dense) (None, None, 88) 35288 ['dropout[0][0]']

tf.compat.v1.floor_div_1 (TFOp (None, 1) 0 ['tf.compat.v1.floor_div[0][0]']
Lambda)

softmax (Softmax) (None, None, 88) 0 ['logits[0][0]']

input_data_params (InputLayer) [(None, 1)] 0 []

tf.cast (TFOpLambda) (None, 1) 0 ['tf.compat.v1.floor_div_1[0][0]'
]

==================================================================================================
Total params: 1,530,948
Trainable params: 1,530,948
Non-trainable params: 0


None
Prediction: 0%| | 0/1 [00:00<?, ?it/s]2021-11-14 01:43:53.279325: I tensorflow/stream_executor/cuda/cuda_dnn.cc:366] Loaded cuDNN version 8204
Prediction: 100%|████████████████████████████████████████████████████████████████████████| 1/1 [00:40<00:00, 40.17s/it]
Prediction of 1 models took 40.511317014694214s
Average sentence confidence: 0.00%
All files written

So the above generates the ginger.pred text file with 0 size.
Can you please tell me what is the mistake I am making?

Could you provide ginger.png as an attachment? Without an example it's quite hard to guess where the problem is.

Hi ,

Sorry for the delay,

here is the file :

img20210924_11235179

Since calamari is a "Line based ATR Engine", you have to tell it where to look for the lines. Use some layout analysis software like https://github.com/qurator-spk/eynollah to produce a PAGE XML file containing information on regions and lines, then give the XML to calamari-predict.

Ok understood,
I will try that and update it here.
I will close the ticket, since I am a little occupied and it may take some time to use ur suggestion.
I will update here with the result.
Thank you.

Since calamari is a "Line based ATR Engine", you have to tell it where to look for the lines. Use some layout analysis software like https://github.com/qurator-spk/eynollah to produce a PAGE XML file containing information on regions and lines, then give the XML to calamari-predict.

Hi ,
I have used the eynollah to generate the XML file, can you please provide me with a command line for using the calamari-ocr with the xml file ?
I am unable to find the proper command line in the calamari-docs.
Thank you.

Most of the models in calamari_models are trained on binarized data, so we need to do that with your image as well. The most basic approach would be to just convert ginger.png -threshold 75% ginger.bin.png, but if you want something fancy, you could try qurator-spk/sbb_binarization.

Now we can predict the text lines: calamari-predict --checkpoint "../calamari_models/uw3-modern-english/*.ckpt" --data "PageXML" --data.images ginger.bin.png (with ginger.xml in the same folder as ginger.bin.png).

This produces the prediction ginger.pred.xml.txt with the results written in the TextEquiv/Unicode Elements.

Since table recognition is quite hard, I guess you would have to experiment with tools like https://github.com/Layout-Parser/layout-parser or custom implementations for your data. Also, since most of the existing models (some more can be found in calamari_models_experimental or poke1024/origami_models) are trained on historical documents, you'd probably need to train your own models for your data to improve the results.

Ok .
Got it, thank you.