asyml / ForteHealth

The project is in the incubation stage and still under development. ForteHealth is a flexible and powerful ML workflow builder for biomedical and clinical scenarios. This is part of the CASL project: http://casl-project.ai/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Add ICD coding support to pipeline

Piyush13y opened this issue · comments

Is your feature request related to a problem? Please describe.
ICD coding is a process of assigning the International Classification of Disease diagnosis codes to clinical/medical notes documented by health professionals (e.g. clinicians).
We currently do not support automatically detecting the ICD codes given a clinical excerpt. This issue explains what is expected from the to-be-developed ICDCodingProcessor for the pipeline.

Describe the solution you'd like
Huggingface has a few models that can be used for this particular use case and we will be leveraging those into our processor. Along with that, new ontologies will have to be defined, that will then be used by the processor and data packs to process and store the ICD codes for any clinical excerpt.

To begin with, we should be defining the ontologies that will be required for this processor. A new parent ontology has to be defined under forte.data.ontology.top.Annotation. MedicalArticle would be the parent ontology which would represent the whole text of a discharge note, etc. ICDCode - child ontology name and there will be couple of attributes within this, namely code (as string) and version (int). These can then be used to store coding information as such:
example input: "Patient has been diagnosed with lung tuberculosis"
example output ontology:

- MedicalArticle
  - ICDCode
      - code "A15.0"
      - version 10

(Building and Generating Ontologies documentation)

Now, moving onto the processor and it is implementation. We want to keep it configurable in the similar way as our NER processors. So the actual model that will be used by our processor will be passed through the config and not hardcoded in the processor to ensure modularity and configurable nature of our processors.

pl.add(
        ICDCodingProcessor(),
        {
            model: "AkshatSurolia/ICD-10-Code-Prediction",
        },
    )

This ICD Coding pretained model can be used as one of the models for ICD coding. The link can be referred to look at how the results are fetched from the model given an input. What's important here is to ensure that the processor can work with different models, if we were to extend support to multiple models going forward.

P.S. You can follow NegationContextAnalyzer processor for the structure and code design. It can be used as the template processor to refer to when implementing a new one.

Describe alternatives you've considered
A few other models and research papers were considered. This particular approach seems to be the one to go with given the simplicity of implementation.