Comparison with GroundedDINO

Question

Comparison with GroundedDINO

Yebulabula opened this issue 3 months ago · comments

Dear author,

Thank you for your significant contribution to the community; the performance of your model is truly impressive. I am curious to know if you have considered replacing the text-guided SAM with GroundedSAM (i.e., GroundedDINO + SAM). Specifically, I meant to finetune the GroundedDINO image-text feature fusion module rather than fine-tuning SAM's mask decoder. While I understand that GroundedDINO may increase computational costs, I am interested in whether this two-stage promptable segmentation pipeline could enhance segmentation results.

Best wises,
Ye