Abhinav43 / MT-RAM

Multitask Recalibrated Aggregation Network for Medical Code Prediction

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Multitask Recalibrated Aggregation Network for Medical Code Prediction (MT-RAM)

To reproduce the results of the paper Multitask Recalibrated Aggregation Network, we present this code repository.

Highlight

  • Code Associations The multi-task learning scheme to capture the relationship between different medical codes.
  • Recalibrate Feature The designed Recalibrated Attention Module (RAM) reduce the effect of noise in clinical documents. Also, RAM could alleviate the lengthy document problem by iterative convolution.
  • Extensible The multi-task learning framework could be extended to multiple (>= 3) medical coding task, such as HCC coding task and CPT coding task.

Package Dependencies

  • allennlp == 0.9.0
  • ax-platform == 0.1.12
  • gensim == 3.8.3
  • plotly == 4.7.1
  • pytorch==1.5.1
  • spacy == 2.1.9
  • tensorboardx == 2.0
  • tokenizers == 0.7.0
  • numpy == 1.15.1
  • nltk == 3.5
  • python == 3.6.12
  • pytorch-pretrained-bert == 0.6.2
  • transformers == 2.9.1

You can use the following command (recommended):

pip install -r requirements.txt

Preprossing

Clinical Document

We follow the preproces setting of MultiResCNN. The structure of data files can be shown like:

data
|   D_ICD_DIAGNOSES.csv
|   D_ICD_PROCEDURES.csv
|   ICD9_descriptions.txt (for DR_CAML)
└───mimic3/
|   |   NOTEEVENTS.csv
|   |   DIAGNOSES_ICD.csv
|   |   PROCEDURES_ICD.csv
|   |   *_hadm_ids.csv (get from CAML)

Running the python preprocess_mimic3.py obtain corresponding ICD code file.

Obtain CCS dataset

Clinical Classifications Software (CCS) for ICD-9-CM is a tool from HCUP. Next, download the dx2015.csv and pr2015.csv from web. Place two file in the data, the structure is shown like this:

data
|   D_ICD_DIAGNOSES.csv
|   D_ICD_PROCEDURES.csv
|   ICD9_descriptions.txt (for DR_CAML)
└───mimic3/
|   |   dev_50.csv
|   |   train_50.csv
|   |   test_50.csv
|   |   dx2015.csv
|   |   pr2015.csv
|   |   NOTEEVENTS.csv
|   |   DIAGNOSES_ICD.csv
|   |   PROCEDURES_ICD.csv
|   |   *_hadm_ids.csv (get from CAML)

use the script python ICD2CCS.py to obtain CCS labels and attach them on corresponding csv files.

Training

MT-RAM

python main.py --MAX_LENGTH 2500 --n_epochs 50 --batch_size 16 --model GRU --lr 8e-3 --MTL Yes --loss_weight_CCS 0.3

CAML + MTL + RAM

python main.py --MAX_LENGTH 2500 --n_epochs 50 --batch_size 16 --model caml --lr 8e-3 --MTL Yes --loss_weight_CCS 0.3

MultiResCNN + MTL + RAM

python main.py --MAX_LENGTH 2500 --n_epochs 50 --batch_size 16 --model MultiResCNN --lr 8e-3 --MTL Yes --loss_weight_CCS 0.3

Main Results (all evaluation results are presented in %)

MIMIC-III (ICD)

Models Macro AUC-ROC Micro AUC-ROC Macro F1 Micro F1 Precision at 5 Model
CAML + MTL + RAM 91.4 93.8 62.5 68.7 65.3 CAML
MultiResCNN + MTL + RAM 91.7 93.9 64.1 69.0 65.0 MultiResCNN
MT-RAM 92.1 94.3 65.1 70.6 66.4 MT-RAM

MIMIC-III (CCS)

Models Macro AUC-ROC Micro AUC-ROC Macro F1 Micro F1 Precision at 5 Model
CAML + MTL + RAM 91.5 94.2 66.9 72.8 67.5 CAML
MultiResCNN + MTL + RAM 91.7 94.3 67.8 72.7 67.3 MultiResCNN
MT-RAM 92.2 94.6 69.3 74.3 68.3 MT-RAM

Citation

If you find that our code is helpful, please use the Bibtex citation shown below.

@article{sun2021multitask,
title={Multitask Recalibrated Aggregation Network for Medical Code Prediction},
author={Sun, Wei and Ji, Shaoxiong and Cambria, Erik and Marttinen, Pekka},
journal={arXiv preprint arXiv:2104.00952},
year={2021}
}

Acknowledgement

We appreciate for all code providers, especially for MultiResCNN, CAML and CCS.

About

Multitask Recalibrated Aggregation Network for Medical Code Prediction

License:GNU General Public License v3.0


Languages

Language:Python 100.0%