[FEATURE] Add ViT weights: RADIO

Question

[FEATURE] Add ViT weights: RADIO

seefun opened this issue 22 days ago · comments

The code and model weights of paper [CVPR 2024] AM-RADIO: Agglomerative Vision Foundation Model - Reduce All Domains Into One has been released by Nvidia

RADIO , a new vision foundation model (actually a new vit pretrained weight), excels across visual domains, serving as a superior replacement for vision backbones. Integrating CLIP variants, DINOv2, and SAM through distillation, it preserves unique features like text grounding and segmentation correspondence.

Feraidoon Mehri · Answer 1 · Wed May 22 2024 10:09:41 GMT+0800 (China Standard Time)

Does RADIO have ImageNet-1k heads?