- Authors: Chengyang Lei, Liyi Chen, Jun Cen, Xiao Chen, Zhen Lei, Felix Heide, Ziwei Liu, Qifeng Chen, Zhaoxiang Zhang
SimMAT aims to transfer the ability of large RGB-based models to other modalities (e.g., Depth, Thermal, Polarization), which suffering from limited training data. For example,SimMAT enable the Segment Anything Model the ability to handle modality beyond RGB images.
Firstly, prepare the project and create the environment.
git clone https://github.com/mt-cly/SimMAT
cd SimMAT
conda create -n simmat python=3.10
conda activate simmat
pip install -r requirements.txt
# pretrained SAM-B
wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_b_01ec64.pth
mv sam_vit_b_01ec64.pth checkpoint/sam
We provide segmentation benchmark to study the segmentation performance in various modalities.
Dataset | Supporting Modalities | Link |
---|---|---|
IVRG_RGBNIR | NIR, NIR+RGB | download(1.0G) |
RGB-Thermal-Glass | Thermal, Thermal+RGB | download(3.0G) |
NYUDepthv2 | Depth, HHA, Depth+RGB, HHA+RGB | download(1.6G) |
pgsnet | AOLP+DOLP, AOLP+DOLP+RGB | download(15.5G) |
zju-rgbp | AOLP+DOLP, AOLP+DOLP+RGB | download(0.3G) |
You can download one or all benchmark from given links, unzip and move them to the data
folder, the file structure should be as follows.
--SimMAT
|--data
|--IVRG_RGBNIR
|--NYUDepthv2
|--pgsnet
|--RGB-Thermal-Glass
|--zju-rgbp
You can simply execute python train.py
followed by optional arguments.
-net # specify the tuning methods. Options: {sam_full_finetune, sam_linear_probing, sam_mlp_adapter, sam_lora, sam_prompt}
-modality # modality name. Options:{pgsnet_rgbp, pgsnet_p, rgbd, d, rgbhha, hha, nir, rgbnir, rgbt, t,zju-rgbp}
-proj_type # the pre-projection before foundation model Options: {simmat, baseline_a, baseline_b}
-exp_name # the experiment name
-val_freq # interval epochs between each validation. Default: 5
-b # batch size. Default: 4
-lr # learning rate. It is suggested to set 3e-4 for PEFT, 3e-5 for Full Finetuning
-weights # the path to trained weights you want to resume
If you want to use DDP, just add extra -ddp
to the command.
We provide an example command to perform adapting SAM to NIR modality in train.sh
.
sh train.sh
@article{lei2024simmat,
title={SimMAT: Exploring Transferability from Vision Foundation Models to Any Image Modality},
author={Lei, Chengyang and Chen, Liyi and Cen, Jun and Chen, Xiao and Lei, Zhen and Heide, Felix and Liu, Ziwei and Chen, Qifeng and Zhang, Zhaoxiang},
journal={arXiv preprint arXiv:2409.08083},
year={2024}
}
The code is based on Medical-SAM-Adapter.