End-to-End Object Detection with Transformers

Question

subinium opened this issue 3 years ago · comments

Subin An · Answer 1 · Sun Jan 31 2021 18:36:19 GMT+0800 (China Standard Time)

Concept

다른 task에서만 적용되던 End-to-End를 Object Detection에서 적용
- 기존 방법론 anchor box, NMS로 휴리스틱 과정이 있어 E2E가 어려웠음 (non-differentiable)
- 이를 Transformer(Encoder-Decoder)를 사용하는 것으로 해결
  - CV에 Transformer가 점점 추가되는 흐름!
object detection에서 어려운 것은 predicted objects와 ground truth와의 매칭
- 예측은 fixed-size인 N(논문에서는 100, COCO max 63)으로 매칭
- 개수 안 맞는 것은 (no object)로 padding을 해주면 되고,
- 그럼 이건 이분매칭(bipartite matching) 문제
- Hungarian Algorithm으로 loss 최소가 되도록 matching
- 일대일 매칭이니 anchor에서 duplicated 되는 문제도 없음
- 정말 2년만에 보는 헝가리안이란...(Problem Solving에서 사용.)
Hungrarian Loss에서 no-object 항은 cross entropy에서 1/10로 계산
Bounding Box Loss : l1 loss + genrealized IoU loss
positional encoding + cnn-based fetures로 sequence 제공 (Encoder)
Decoder에서 나온 결과 + FFN으로 결과 예측
실험 결과
- NMS 유무와 Decoder 개수로 NMS를 대체함을 보임
- decoder에서 edge에 attention이 많이 가게 학습
- positional encoding이 도움됨 확인
- OOD(out-of-distribution)에도 잘함
- segmentation에도 적용 가능

Subin An · Answer 2 · Sun Jan 31 2021 18:36:38 GMT+0800 (China Standard Time)

Subin An · Answer 3 · Sun Jan 31 2021 18:37:32 GMT+0800 (China Standard Time)