VIM4Path

Vim4Path: Self-Supervised Vision Mamba for Histopathology Images, CVPR 2024.

Abstract, Representation learning from Gigapixel Whole Slide Images (WSI) poses a significant challenge in computational pathology due to the complicated nature of tissue structures and the scarcity of labeled data. Multi-instance learning methods have addressed this challenge, leveraging image patches to classify slides utilizing pretrained models using Self-Supervised Learning (SSL) approaches. The performance of both SSL and MIL methods relies on the architecture of the feature encoder. This paper proposes leveraging the Vision Mamba (Vim) architecture, inspired by state space models, within the DINO framework for representation learning in computational pathology. We evaluate the performance of Vim against Vision Transformers (ViT) on the Camelyon16 dataset for both patch-level and slide-level classification. Our findings highlight Vim’s enhanced performance compared to ViT, particularly at smaller scales, where Vim achieves an 8.21 increase in ROC AUC for models of similar size. An explainability analysis further highlights Vim’s capabilities, which reveals that Vim uniquely emulates the pathologist workflow—unlike ViT. This alignment with human expert analysis highlights Vim’s potential in practical diagnostic settings and contributes significantly to developing effective representation-learning algorithms in computational pathology.

[arXiv] | [Cite]

Installation

Use the installation guide on Vision Mamba Repo. Also, need to install packages such as shapely, openslide, opencv, h5py, and lxml for data processing.

Dataset

Dataset Source

Camelyon16 WSI images can be downloaded from the following FTP site: CAMELYON16 Dataset FTP

Data processing

You should use the preprocess folder where we integrated CLAM preprocessing code.

To create patches at 10x zooming level, we can use the following commands:

python create_patches_fp.py --source path_to_Camelyon16/testing/images/ --save_dir ../dataset/Camelyon16/testing/224_10x/h5/ --patch_size 224 --step_size 224 --patch_level 2 --seg --patch --stitch
python create_patches_fp.py --source path_to_Camelyon16/training/normal/ --save_dir ../dataset/Camelyon16/training/224_10x/h5/normal/ --patch_size 224 --step_size 224 --patch_level 2 --seg --patch --stitch
python create_patches_fp.py --source path_to_Camelyon16/training/tumor/ --save_dir ../dataset/Camelyon16/training/224_10x/h5/tumor/ --patch_size 224 --step_size 224 --patch_level 2 --seg --patch --stitch

Use the extract_patches.py script for pretraining image extraction with the following command:

python extract_patches.py --raw_data_folder path_to_raw_WSIs --wsi_extension tif --input_folder path_to_h5_files --output_folder path_to_save_patches

To extract patches for patch-level classification use the camelyon16_extraction.ipynb.

Pretraining

You should use the dino folder for pretraining.

For pretraining code you can use the following command. Make sure to have a total batch size of 512 across all GPUs similar to the paper. You can ignore "disable_wand" if you want to use W&B to track your experiments.

python -m torch.distributed.launch --nproc_per_node=4 main.py --data_path patch_to_pretraining_images --output_dir checkpoints/camelyon16_224_10x/vim-s/ --image_size 224 --image_size_down 96 --batch_size_per_gpu 128 --arch vim-s --disable_wand

Patch-Level Evaluation

You can use the following command to evaluate each model's performance on extracted patch-level images (using camelyon16_extraction.ipynb). We use batch_size of 64 and train for 20 epochs since all methods tend to overfit after this number of epochs.

python -m torch.distributed.launch --nproc_per_node=1 eval_linear.py --output_dir checkpoints/camelyon16_224_10x/vim-s/eval_linear --train_data_path path_to_balanced_pcam10x_data --val_data_path /data2/projects/VIM4Path/datasets/Camelyon16/Cam16_Balanced/Balanced/224_5x/test/ --pretrained_weights checkpoints/camelyon16_224_10x/vim-s/checkpoint.pth --arc vim-s  --image_size 224 --epochs 20  --batch_size 64 --
disable_wand

Slide-Level Evaluation

For slide-level classification you can use the following command to get the features for slide at 10x using the pretrained model at 10x.

python mil_data_creation.py --image_size 224 --arch vim-s --pretrained_weights dino/checkpoints/camelyon16_224_10x/vim-s_224-96/checkpoint.pth --source_level 10 --target_level 10

We modify the CLAM code to work on our dataset. So, you can use the following command in the MIL folder to get slide-level performance.

python main_cam.py  --image_size 224 --arch vim-s --source_level 10 --target_level 10 --exp_code vim-s-224-10at10-clam_sb --model_type clam_sb --drop_out --early_stopping --lr 2e-4 --k 1 --label_frac 1  --weighted_sample --bag_loss ce --inst_loss svm --task task_1_tumor_vs_normal --log_data

Weights

The pretrained (no labels) weights and the self-supervised logs are provided below.

arch	ROC AUC (Cam16)	download
ViT-ti	87.60	checkpoints	pretraining log
ViT-s	96.76	checkpoints	pretraining log
Vim-ti	95.81	checkpoints	pretraining log
Vim-ti-plus	97.39	checkpoints	pretraining log
Vim-s	98.85	checkpoints	pretraining log

Citation

If you find this repository useful, please consider giving a star and citation (arxiv preprint):

@article{nasiri2024vim4path,
  title={Vim4Path: Self-Supervised Vision Mamba for Histopathology Images},
  author={Nasiri-Sarvi, Ali and Trinh, Vincent Quoc-Huy and Rivaz, Hassan and Hosseini, Mahdi S},
  journal={arXiv preprint arXiv:2404.13222},
  year={2024}
}

AtlasAnalyticsLab / Vim4Path