aim-uofa / Matcher

[ICLR'24] Matcher: Segment Anything with One Shot Using All-Purpose Feature Matching

Home Page:https://arxiv.org/abs/2305.13310

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Why can't use SAM encoder to get extracted feature?

ruizhaoz opened this issue · comments

Have you try directly use SAM encoder to extract feature instead use other pretrained model?

The features extracted using SAM achieve only around 20 mIoU on fold 0 of COCO-20i. The SAM encoder with weak semantics performs poorly in complex scenes. Here are two reasons for this:

  1. Poor feature matching: SAM's features fail to match multiple instances with similar semantics in complex scenes.
  2. Poor semantic guidance: SAM cannot provide effective semantic guidance for ILM (Instance-Level Matching) to select high-quality mask proposals.

Dinov2 has great ability in instance retrieval / dense matching. The backbone of SAM is pretrained via MAE, whose feature is not that discriminative.