OCR-D / core

Collection of OCR-related python tools and wrappers from @OCR-D

Home Page:https://ocr-d.de/core/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Processor not failing when the required page id is not found

MehmedGIT opened this issue · comments

The Processor class should raise an Exception but does not when a page id is not found. To be precise - when the whole input file group is missing from the mets.xml.

From the ocrd_network module workflow endpoint in #1083:

DEBUG:ocrd_network.processing_worker:Consumer tag: ctag1.9aaac84eeb3d41e7aaef5742a39a3075, message delivery tag: 1, redelivered: False
DEBUG:ocrd_network.processing_worker:Message headers: {'OCR-D WebApi Header': 'OCR-D WebApi Value'}
DEBUG:ocrd_network.processing_worker:Trying to decode processing message with tag: 1
INFO:ocrd_network.processing_worker:Starting to process the received message: {'job_id': '2eabd3b7-e177-4953-b095-c4d5aa6b44b0', 'processor_name': 'ocrd-cis-ocropy-binarize', 'created_time': 1695708719, 'input_file_grps': ['OCR-D-IMG'], 'output_file_grps': ['OCR-D-BIN'], 'path_to_mets': '/home/mm/Desktop/ocrd_network_files/example_ws3/data/mets.xml', 'page_id': 'PHYS_0001', 'internal_callback_url': 'http://localhost:8080/result_callback', 'parameters': {}}
DEBUG:ocrd_network.processing_worker:Invoking processor: ocrd-cis-ocropy-binarize
10:11:59.267 WARNING ocrd.processor.base - Could not find any files for --page-id PHYS_0001 - compare 'PHYS_0001' with the output of 'orcd workspace list-page'.
WARNING:ocrd.processor.base:Could not find any files for --page-id PHYS_0001 - compare 'PHYS_0001' with the output of 'orcd workspace list-page'.
10:11:59.267 INFO ocrd.process.profile - Executing processor 'ocrd-cis-ocropy-binarize' took 0.003558s (wall) 0.000931s (CPU)( [--input-file-grp='OCR-D-IMG' --output-file-grp='OCR-D-BIN' --parameter='{"method": "ocropy", "threshold": 0.5, "grayscale": false, "maxskew": 0.0, "noise_maxsize": 0, "dpi": 0, "level-of-operation": "page"}' --page-id='PHYS_0001']
INFO:ocrd.process.profile:Executing processor 'ocrd-cis-ocropy-binarize' took 0.003558s (wall) 0.000931s (CPU)( [--input-file-grp='OCR-D-IMG' --output-file-grp='OCR-D-BIN' --parameter='{"method": "ocropy", "threshold": 0.5, "grayscale": false, "maxskew": 0.0, "noise_maxsize": 0, "dpi": 0, "level-of-operation": "page"}' --page-id='PHYS_0001']

The warning is logged but since no exception is triggered the processing worker cannot catch it. And since no exception was caught, the processing step succeeds although no output is produced. Afterward, even though all jobs in a workflow succeed no output is produced at all.

This seems to have been introduced by #1089 and the problem there is not raising an exception if the on_error parameter is set to abort.

@MehmedGIT fixed this in #1104, will be fixed in master once #1083 is merged.