AstraZeneca / KAZU

Fast, world class biomedical NER

Home Page:https://AstraZeneca.github.io/KAZU/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Better Exception handling

RichJackson opened this issue · comments

Document.metadata[PROCESSING_EXCEPTION] will get overwritten if a document fails for multiple steps.

We probably want a dictionary from the step namespace to the exception instead, but note we need to communicate that this will change in some release to BIKG.

It would be nice if we could choose at a pipeline level whether exceptions actually get (re-)raised as well. It seems like we could do this either doing one of:

  1. Always (re-)raising in the step, but then having an 'except' clause at the pipeline level that may just log exceptions rather than (re-)raising, depending on how the Pipeline object is configured
  2. Put the 'actual exception' in Document.metadata[PROCESSING_EXCEPTION] rather than the result of traceback.format_exc()
  • Then in the Pipeline, iterate over the documents and raise any exceptions
  • Make the Document serialization format the exception into a string
  1. Maybe a combo of both 1 and 2 above? I think raising in the Step might be better, as if there's a problem that affects all documents, we'll get an exception right away rather than when the whole run has finished. But at the same time, I think actually having the 'real exceptions' would be useful.