lartpang / papers-for-reference

Some papers for reference.

Papers for Reference

VLM

Transfering

CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense Prediction
- Authors: Size Wu, Wenwei Zhang, Lumin Xu, Sheng Jin, Xiangtai Li, Wentao Liu, Chen Change Loy
- Links: arXiv:2310.01403 | GitHub
- Keypoints: Enhance the local region representation of CLIP for downstream open-vocabulary dense prediction tasks.

Unified Architecture

Multi-Modal

TCSVT 2021 | SwinNet: Swin Transformer drives edge-aware RGB-D and RGB-T salient object detection
- Authors: Zhengyi Liu, Yacheng Tan, Qian He, Yun Xiao
- Links: arXiv:2204.05585 | GitHub
- Keypoints: Unified architecture and separate parameter for RGB-Depth/Thermal SOD.
TIP 2023 | CAVER: Cross-Modal View-Mixed Transformer for Bi-Modal Salient Object Detection
- Authors: Youwei Pang, Xiaoqi Zhao, Lihe Zhang, Huchuan Lu
- Links: arXiv:2112.02363 | GitHub
- Keypoints: Unified architecture and separate parameter for RGB-Depth/Thermal SOD.
ComPtr: Towards Diverse Bi-source Dense Prediction Tasks via A Simple yet General Complementary Transformer
- Authors: Youwei Pang, Xiaoqi Zhao, Lihe Zhang, Huchuan Lu
- Links: arXiv:2307.12349 | GitHub
- Keypoints: Unified architecture and separate parameter for RGB-RGB Remote Sensing Change Detection, RGB-Thermal Crowd Counting, RGB-Depth/Thermal SOD, and RGB-Depth Semantic Segmentation.
All in One: RGB, RGB-D, and RGB-T Salient Object Detection
- Authors: Xingzhao Jia, Zhongqiu Zhao, Changlei Dongye, Zhao Zhang
- Links: arXiv:2311.14746
- Keypoints: Unified architecture and separate parameter for RGB-RGB/Depth/Thermal SOD.
Unified-modal Salient Object Detection via Adaptive Prompt Learning
- Authors: Kunpeng Wang, Chenglong Li, Zhengzheng Tu, Bin Luo
- Links: arXiv:2311.16835
- Keypoints: ???
VSCode: General Visual Salient and Camouflaged Object Detection with 2D Prompt Learning
- Authors: Ziyang Luo, Nian Liu, Wangbo Zhao, Xuguang Yang, Dingwen Zhang, Deng-Ping Fan, Fahad Khan, Junwei Han
- Links: arXiv:2311.15011
- Keypoints: Unified architecture and separate prompts for joint learning from RGB-RGB/Depth/Thermal/Flow SOD and RGB-RGB/Depth/Flow COD based on domain-specific and task-specific parameters (prompts).

Multi-Pipeline

ZoomNeXt: A Unified Collaborative Pyramid Network for Camouflaged Object Detection
- Authors: Youwei Pang, Xiaoqi Zhao, Tian-Zhu Xiang, Lihe Zhang, Huchuan Lu
- Links: arXiv:2310.20208 | GitHub
- Keypoints: Unified architecture and separate parameter for RGB image/sequence COD based on the difference-based conditional computation.

About

Some papers for reference.

GNU General Public License v3.0