OCR-D / core

Collection of OCR-related python tools and wrappers from @OCR-D

Home Page:https://ocr-d.de/core/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Can we properly solve the reason for `tf1_disable_interactive_logs` existence?

kba opened this issue · comments

In ocrd_network/utils we have

def tf_disable_interactive_logs():                                       
    try:                                                                 
        # This env variable must be set before importing from Keras      
        environ['TF_CPP_MIN_LOG_LEVEL'] = '3'                            
        # from tensorflow.keras.utils import disable_interactive_logging 
        # Enabled interactive logging throws an exception                
        # due to a call of sys.stdout.flush()                            
        disable_interactive_logging()                                    
    except Exception:                                                    
        # Nothing should be handled here if TF is not available          
        pass                                                             

Why did we do that and how can we get rid of it? Because importing tensorflow is expensive and this is particularly strongly felt with the bashlib processors/tests because they create new python sessions (with all the penalties from importing tensorflow) many times during a single run.

There are other bottlenecks like parsing YAML and importing modules globally that are only needed in a single if-else clause but this is the lowest-hanging fruit.

Keras thinks shell is interactive but it is not in case of the Processing Worker. Check here as well. Potentially this should be resolved on processor level, so we do not have to do that manually in ocrd network.

2023-02-17 15:11:54,788 - ocrd.network.processing_worker - DEBUG - Starting to process the received message: <ocrd.network.rabbitmq_utils.ocrd_messages.OcrdProcessingMessage object at 0x7f6db9a54050>
2023-02-17 15:11:54,789 - ocrd.network.processing_worker - DEBUG - Invoking the pythonic processor: ocrd-calamari-recognize
2023-02-17 15:11:54,789 - ocrd.network.processing_worker - DEBUG - Invoking the processor_class: <class 'ocrd_calamari.recognize.CalamariRecognize'>
2023-02-17 15:11:55,233 - ocrd.network.processing_worker - ERROR - [Errno 5] Input/output error
Traceback (most recent call last):
  File "/home/mm/Desktop/core/ocrd/ocrd/network/processing_worker.py", line 234, in run_processor_from_worker
    instance_caching=False
  File "/home/mm/Desktop/core/ocrd/ocrd/processor/helpers.py", line 95, in run_processor
    instance_caching=instance_caching
  File "/home/mm/Desktop/core/ocrd/ocrd/processor/helpers.py", line 332, in get_processor
    parameter=parameter
  File "/home/mm/venv37-ocrd-new/lib/python3.7/site-packages/ocrd_calamari/recognize.py", line 44, in __init__
    self.setup()
  File "/home/mm/venv37-ocrd-new/lib/python3.7/site-packages/ocrd_calamari/recognize.py", line 52, in setup
    self.predictor = MultiPredictor(checkpoints=checkpoints)
  File "/home/mm/venv37-ocrd-new/lib/python3.7/site-packages/calamari_ocr/ocr/predictor.py", line 228, in __init__
    data_preproc=data_preproc, processes=processes) for cp in checkpoints]
  File "/home/mm/venv37-ocrd-new/lib/python3.7/site-packages/calamari_ocr/ocr/predictor.py", line 228, in <listcomp>
    data_preproc=data_preproc, processes=processes) for cp in checkpoints]
  File "/home/mm/venv37-ocrd-new/lib/python3.7/site-packages/calamari_ocr/ocr/predictor.py", line 116, in __init__
    graph_type="predict", batch_size=batch_size)
  File "/home/mm/venv37-ocrd-new/lib/python3.7/site-packages/calamari_ocr/ocr/backends/tensorflow_backend/tensorflow_backend.py", line 17, in create_net
    processes=self.processes,
  File "/home/mm/venv37-ocrd-new/lib/python3.7/site-packages/calamari_ocr/ocr/backends/tensorflow_backend/tensorflow_model.py", line 59, in __init__
    print(self.model.summary())
  File "/home/mm/venv37-ocrd-new/lib/python3.7/site-packages/keras/engine/training.py", line 3304, in summary
    layer_range=layer_range,
  File "/home/mm/venv37-ocrd-new/lib/python3.7/site-packages/keras/utils/layer_utils.py", line 319, in print_summary
    print_fn(f'Model: "{model.name}"')
  File "/home/mm/venv37-ocrd-new/lib/python3.7/site-packages/keras/utils/io_utils.py", line 80, in print_msg
    sys.stdout.flush()
OSError: [Errno 5] Input/output error
2023-02-17 15:11:55,233 - ocrd.network.processing_worker - ERROR - <class 'ocrd_calamari.recognize.CalamariRecognize'> failed with an exception.

We can start by fixing this in ocrd_calamari. I'll drop the actual calls to the method from core and add them to ocrd_calamari.

@MehmedGIT Can you check whether #1091 combined with OCR-D/ocrd_calamari#90 solves the issue? Then I can check which other processors need this.

@kba, I have just tested and I see no problems.