XrayGPT: Chest Radiographs Summarization using Medical Vision-Language Models.

Omkar Thawakar* , Abdelrahman Shaker* , Sahal Shaji Mullappilly* , Hisham Cholakkal, Rao Muhammad Anwer, Salman Khan, Jorma Laaksonen, and Fahad Shahbaz Khan.

*Equal Contribution

Mohamed bin Zayed University of Artificial Intelligence, UAE

🚀 News

May-19 : Our code, models, and pre-processed report summaries are released.

Online Demo

You can try our demo using the provided examples or by uploading your own X-ray here : Link-1 | Link-2 | Link-3 .

About XrayGPT

XrayGPT aims to stimulate research around automated analysis of chest radiographs based on the given x-ray.
The LLM (Vicuna) is fine-tuned on medical data (100k real conversations between patients and doctors) and ~30k radiology conversations to acquire domain specific and relevant features.
We generate interactive and clean summaries (~217k) from free-text radiology reports of two datasets (MIMIC-CXR and OpenI). These summaries serve to enhance the performance of LLMs through fine-tuning the linear transformation layer on high-quality data. For more details regarding our high-quality summaries, please check Dataset Creation.
We align frozen medical visual encoder (MedClip) with a fune-tuned LLM (Vicuna), using simple linear transformation.

Getting Started

Installation

1. Prepare the code and the environment

Clone the repository and create a anaconda environment

git clone https://github.com/mbzuai-oryx/XrayGPT.git
cd XrayGPT
conda env create -f env.yml
conda activate xraygpt

git clone https://github.com/mbzuai-oryx/XrayGPT.git
cd XrayGPT
conda create -n xraygpt python=3.9
conda activate xraygpt
pip install -r xgpt_requirements.txt

Setup

1. Prepare the Datasets for training

Refer the dataset_creation for more details.

Download the preprocessed annoatations mimic & openi. Respective image folders contains the images from the dataset.

Following will be the final dataset folder structure:

dataset
├── mimic
|    ├── image
|    |   ├──abea5eb9-b7c32823-3a14c5ca-77868030-69c83139.jpg
|    |   ├──427446c1-881f5cce-85191ce1-91a58ba9-0a57d3f5.jpg
|    |   .....
|    ├──filter_cap.json
├── openi
|    ├── image
|    |   ├──1.jpg
|    |   ├──2.jpg
|    |   .....
|    ├──filter_cap.json
...

3. Prepare the pretrained Vicuna weights

We built XrayGPT on the v1 versoin of Vicuna-7B. We finetuned Vicuna using curated radiology report samples. Download the Vicuna weights from vicuna_weights The final weights would be in a single folder in a structure similar to the following:

vicuna_weights
├── config.json
├── generation_config.json
├── pytorch_model.bin.index.json
├── pytorch_model-00001-of-00003.bin
...

Then, set the path to the vicuna weight in the model config file "xraygpt/configs/models/xraygpt.yaml" at Line 16.

To finetune Vicuna on radiology samples please download our curated radiology and medical_healthcare conversational samples and refer the original Vicuna repo for finetune.Vicuna_Finetune

4. Download the pretrained Minigpt-4 checkpoint

Download the pretrained minigpt-4 checkpoints. ckpt

5. Training of XrayGPT

A. First mimic pretraining stage

In the first pretrained stage, the model is trained using image-text pairs from preprocessed mimic dataset.

To launch the first stage training, run the following command. In our experiments, we use 4 AMD MI250X GPUs.

torchrun --nproc-per-node NUM_GPU train.py --cfg-path train_configs/xraygpt_mimic_pretrain.yaml

2. Second openi finetuning stage

In the second stage, we use a small high quality image-text pair openi dataset preprocessed by us.

Run the following command. In our experiments, we use AMD MI250X GPU.

torchrun --nproc-per-node NUM_GPU train.py --cfg-path train_configs/xraygpt_openi_finetune.yaml

Launching Demo on local machine

Download the pretrained xraygpt checkpoints. link

Add this ckpt in "eval_configs/xraygpt_eval.yaml".

Try gradio demo.py on your local machine with following

python demo.py --cfg-path eval_configs/xraygpt_eval.yaml  --gpu-id 0

Examples

Acknowledgement

MiniGPT-4 Enhancing Vision-language Understanding with Advanced Large Language Models. We built our model on top of MiniGPT-4.
MedCLIP Contrastive Learning from Unpaired Medical Images and Texts. We used medical aware image encoder from MedCLIP.
BLIP2 The model architecture of XrayGPT follows BLIP-2.
Lavis This repository is built upon Lavis!
Vicuna The fantastic language ability of Vicuna is just amazing. And it is open-source!

License

This repository is licensed under CC BY-NC-SA. Please refer to the license terms here.

akshaydudhane16 / XrayGPT