Comparison with GroundedDINO
Yebulabula opened this issue · comments
Dear author,
Thank you for your significant contribution to the community; the performance of your model is truly impressive. I am curious to know if you have considered replacing the text-guided SAM with GroundedSAM (i.e., GroundedDINO + SAM). Specifically, I meant to finetune the GroundedDINO image-text feature fusion module rather than fine-tuning SAM's mask decoder. While I understand that GroundedDINO may increase computational costs, I am interested in whether this two-stage promptable segmentation pipeline could enhance segmentation results.
Best wises,
Ye