suyash-chintawar / Medical-document-retrieval-using-knowledge-based-Transformer-models

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Medical-document-retrieval-using-knowledge-based-Transformer-models

Overview

When it comes to document retrieval for a query, vocabulary mismatch is a significant issue in the medical field. Since the documents are typically authored by professionals, they may contain many specialized terms that are not widely understood or used. Therefore, this retrieval issue in the medical context needs to be resolved. Traditional models like VSM and BM-25 fail in this regard. There has been a lot of work done using neural networks. This kind of model is popularly known as neural learning to rank(NLTR). The more recent method, particularly in the medical field, involves using knowledge bases (KB) that map words to concepts and aid in connecting several words to the same concept. Transformers have recently had a lot of success in NLP. This paper experiments with various Siamese-structured transformer and knowledge based retrieval models to address the retrieval issues in the medical area. After thorough experimentation, it is observed that the proposed retrieval model when combined with UMLSBert_ENG transformer gives the best results on almost all metrics.

Repository Link

Link to GitHub Repository:
https://github.com/aprameya2001/Medical-document-retrieval-using-knowledge-based-Transformer-models

File Structure

  • nfcorpus/ : NFCorpus dataset
  • training_scripts/ : All training python notebooks, each notebook containing the methodology to train the model with a different core encoder model
  • retrieval_evaluation.ipynb : Notebook for displaying evaluation results of each model
  • iap.jpg : Plot of 11-point Interpolated Averaged Precision (IAP)

Setup

  • Clone the git repository:

git clone https://github.com/aprameya2001/Medical-document-retrieval-using-knowledge-based-Transformer-models.git

  • Install all requirements:

pip install -r requirements.txt

  • Each variation of the retrieval model (with a different core encoder model) can be trained by running the corresponding notebook in training_scripts/ folder

  • Evaluation of the trained model can be done by the following the process displayed in retrieval_evaluation.ipynb

Contributors

  • Aprameya Dash
  • Alimurtaza Mustafa Merchant
  • Suyash Chintawar

About


Languages

Language:Jupyter Notebook 100.0%