lartpang / papers-for-reference

Some papers for reference.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Papers for Reference

VLM

Transfering

  • CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense Prediction
    • Authors: Size Wu, Wenwei Zhang, Lumin Xu, Sheng Jin, Xiangtai Li, Wentao Liu, Chen Change Loy
    • Links: arXiv:2310.01403 | GitHub
    • Keypoints: Enhance the local region representation of CLIP for downstream open-vocabulary dense prediction tasks.

Unified Architecture

Multi-Modal

  • TCSVT 2021 | SwinNet: Swin Transformer drives edge-aware RGB-D and RGB-T salient object detection
    • Authors: Zhengyi Liu, Yacheng Tan, Qian He, Yun Xiao
    • Links: arXiv:2204.05585 | GitHub
    • Keypoints: Unified architecture and separate parameter for RGB-Depth/Thermal SOD.
  • TIP 2023 | CAVER: Cross-Modal View-Mixed Transformer for Bi-Modal Salient Object Detection
    • Authors: Youwei Pang, Xiaoqi Zhao, Lihe Zhang, Huchuan Lu
    • Links: arXiv:2112.02363 | GitHub
    • Keypoints: Unified architecture and separate parameter for RGB-Depth/Thermal SOD.
  • ComPtr: Towards Diverse Bi-source Dense Prediction Tasks via A Simple yet General Complementary Transformer
    • Authors: Youwei Pang, Xiaoqi Zhao, Lihe Zhang, Huchuan Lu
    • Links: arXiv:2307.12349 | GitHub
    • Keypoints: Unified architecture and separate parameter for RGB-RGB Remote Sensing Change Detection, RGB-Thermal Crowd Counting, RGB-Depth/Thermal SOD, and RGB-Depth Semantic Segmentation.
  • All in One: RGB, RGB-D, and RGB-T Salient Object Detection
    • Authors: Xingzhao Jia, Zhongqiu Zhao, Changlei Dongye, Zhao Zhang
    • Links: arXiv:2311.14746
    • Keypoints: Unified architecture and separate parameter for RGB-RGB/Depth/Thermal SOD.
  • Unified-modal Salient Object Detection via Adaptive Prompt Learning
    • Authors: Kunpeng Wang, Chenglong Li, Zhengzheng Tu, Bin Luo
    • Links: arXiv:2311.16835
    • Keypoints: ???
  • VSCode: General Visual Salient and Camouflaged Object Detection with 2D Prompt Learning
    • Authors: Ziyang Luo, Nian Liu, Wangbo Zhao, Xuguang Yang, Dingwen Zhang, Deng-Ping Fan, Fahad Khan, Junwei Han
    • Links: arXiv:2311.15011
    • Keypoints: Unified architecture and separate prompts for joint learning from RGB-RGB/Depth/Thermal/Flow SOD and RGB-RGB/Depth/Flow COD based on domain-specific and task-specific parameters (prompts).

Multi-Pipeline

  • ZoomNeXt: A Unified Collaborative Pyramid Network for Camouflaged Object Detection
    • Authors: Youwei Pang, Xiaoqi Zhao, Tian-Zhu Xiang, Lihe Zhang, Huchuan Lu
    • Links: arXiv:2310.20208 | GitHub
    • Keypoints: Unified architecture and separate parameter for RGB image/sequence COD based on the difference-based conditional computation.

About

Some papers for reference.

License:GNU General Public License v3.0