Processor not failing when the required page id is not found
MehmedGIT opened this issue · comments
The Processor class should raise an Exception but does not when a page id is not found. To be precise - when the whole input file group is missing from the mets.xml
.
From the ocrd_network
module workflow endpoint in #1083:
DEBUG:ocrd_network.processing_worker:Consumer tag: ctag1.9aaac84eeb3d41e7aaef5742a39a3075, message delivery tag: 1, redelivered: False
DEBUG:ocrd_network.processing_worker:Message headers: {'OCR-D WebApi Header': 'OCR-D WebApi Value'}
DEBUG:ocrd_network.processing_worker:Trying to decode processing message with tag: 1
INFO:ocrd_network.processing_worker:Starting to process the received message: {'job_id': '2eabd3b7-e177-4953-b095-c4d5aa6b44b0', 'processor_name': 'ocrd-cis-ocropy-binarize', 'created_time': 1695708719, 'input_file_grps': ['OCR-D-IMG'], 'output_file_grps': ['OCR-D-BIN'], 'path_to_mets': '/home/mm/Desktop/ocrd_network_files/example_ws3/data/mets.xml', 'page_id': 'PHYS_0001', 'internal_callback_url': 'http://localhost:8080/result_callback', 'parameters': {}}
DEBUG:ocrd_network.processing_worker:Invoking processor: ocrd-cis-ocropy-binarize
10:11:59.267 WARNING ocrd.processor.base - Could not find any files for --page-id PHYS_0001 - compare 'PHYS_0001' with the output of 'orcd workspace list-page'.
WARNING:ocrd.processor.base:Could not find any files for --page-id PHYS_0001 - compare 'PHYS_0001' with the output of 'orcd workspace list-page'.
10:11:59.267 INFO ocrd.process.profile - Executing processor 'ocrd-cis-ocropy-binarize' took 0.003558s (wall) 0.000931s (CPU)( [--input-file-grp='OCR-D-IMG' --output-file-grp='OCR-D-BIN' --parameter='{"method": "ocropy", "threshold": 0.5, "grayscale": false, "maxskew": 0.0, "noise_maxsize": 0, "dpi": 0, "level-of-operation": "page"}' --page-id='PHYS_0001']
INFO:ocrd.process.profile:Executing processor 'ocrd-cis-ocropy-binarize' took 0.003558s (wall) 0.000931s (CPU)( [--input-file-grp='OCR-D-IMG' --output-file-grp='OCR-D-BIN' --parameter='{"method": "ocropy", "threshold": 0.5, "grayscale": false, "maxskew": 0.0, "noise_maxsize": 0, "dpi": 0, "level-of-operation": "page"}' --page-id='PHYS_0001']
The warning is logged but since no exception is triggered the processing worker cannot catch it. And since no exception was caught, the processing step succeeds although no output is produced. Afterward, even though all jobs in a workflow succeed no output is produced at all.
This seems to have been introduced by #1089 and the problem there is not raising an exception if the on_error
parameter is set to abort
.
@MehmedGIT fixed this in #1104, will be fixed in master once #1083 is merged.