Why can't use SAM encoder to get extracted feature?

Question

Why can't use SAM encoder to get extracted feature?

ruizhaoz opened this issue a year ago · comments

Have you try directly use SAM encoder to extract feature instead use other pretrained model?

Yang Liu · Answer 1 · Fri Aug 11 2023 15:30:54 GMT+0800 (China Standard Time)

The features extracted using SAM achieve only around 20 mIoU on fold 0 of COCO-20i. The SAM encoder with weak semantics performs poorly in complex scenes. Here are two reasons for this:

Poor feature matching: SAM's features fail to match multiple instances with similar semantics in complex scenes.
Poor semantic guidance: SAM cannot provide effective semantic guidance for ILM (Instance-Level Matching) to select high-quality mask proposals.

Jia-Chang Feng · Answer 2 · Tue Feb 20 2024 18:12:22 GMT+0800 (China Standard Time)

Dinov2 has great ability in instance retrieval / dense matching. The backbone of SAM is pretrained via MAE, whose feature is not that discriminative.