A Comprehensive Survey on Segment Anything Model for Vision and Beyond

The First Comprehensive SAM Survey: A Comprehensive Survey on Segment Anything Model for Vision and Beyond. Chunhui Zhang, Li Liu, Yawen Cui, Guanjie Huang, Weilin Lin, Yiqian Yang, Yuehong Hu. [paper] [homepage][中文解读]

Abstract: Artificial intelligence (AI) is evolving towards artificial general intelligence, which refers to the ability of an AI system to perform a wide range of tasks and exhibit a level of intelligence similar to that of a human being. This is in contrast to narrow or specialized AI, which is designed to perform specific tasks with a high degree of efficiency. Therefore, it is urgent to design a general class of models, which we term foundation models, trained on broad data that can be adapted to various downstream tasks. The recently proposed segment anything model (SAM) has made significant progress in breaking the boundaries of segmentation, greatly promoting the development of foundation models for computer vision. To fully comprehend SAM, we conduct a survey study. As the first to comprehensively review the progress of segmenting anything task for vision and beyond based on the foundation model of SAM, this work focuses on its applications to various tasks and data types by discussing its historical development, recent progress, and profound impact on broad applications. We first introduce the background and terminology for foundation models including SAM, as well as state-of-the-art methods contemporaneous with SAM that are significant for segmenting anything task. Then, we analyze and summarize the advantages and limitations of SAM across various image processing applications, including software scenes, real-world scenes, and complex scenes. Importantly, many insights are drawn to guide future research to develop more versatile foundation models and improve the architecture of SAM. We also summarize massive other amazing applications of SAM in vision and beyond. Finally, we maintain a continuously updated paper list and an open-source project summary for foundation model SAM at here.

Awesome Segment Anything Models: A curated list of awesome segment anything models in computer vision and beyond. This repository supplements our survey paper. We intend to continuously update it.

We strongly encourage authors of relevant works to make a pull request and add their paper's information [here].

News

- 2024.01.31: Latest update of this paper list.
- 2023.07.14: "Segment Anything" was accepted by ICCV 2023.
- 2023.05.16: An initial version of recent papers and projects.
- 2023.04.05: The paper of "Segment Anything" was online.

Paper List
- Seminal Papers
- Follow-up Papers
Open Source Projects
Awesome Repositories for SAM

Citation

If you find our work useful in your research, please consider citing:

@article{chunhui2023samsurvey,
  title={A Comprehensive Survey on Segment Anything Model for Vision and Beyond},
  author={Zhang, Chunhui and Liu, Li and Cui, Yawen and Huang, Guanjie and Lin, Weilin and Yang, Yiqian and Hu, Yuehong},
  journal={arXiv:2305.08196},
  year={2023}
}

Paper List

Seminal Papers

SAM: Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C. Berg, Wan-Yen Lo, Piotr Dollár, Ross Girshick.
"Segment Anything." ICCV (2023). [paper] [homepage] [code] [Zhihu] [Reddit] [2023.04]
GPT-4V: OpenAI.
"GPT-4V(ision) System Card." ArXiv (2023). [paper] [homepage] [2023.09]
Gemini: Gemini Team, Googl.
"Gemini: A Family of Highly Capable Multimodal Models." ArXiv (2023). [paper] [homepage] [blog] [2023.12]
SEEM: Xueyan Zou, Jianwei Yang, Hao Zhang, Feng Li, Linjie Li, Jianfeng Gao, Yong Jae Lee.
"Segment Everything Everywhere All at Once." NeurIPS (2023). [paper] [code] [2023.04]
SegGPT: Xinlong Wang, Xiaosong Zhang, Yue Cao, Wen Wang, Chunhua Shen, Tiejun Huang.
"SegGPT: Segmenting Everything In Context." ICCV (2023). [paper] [code] [2023.04]
Grounding DINO: Shilong Liu, Zhaoyang Zeng, Tianhe Ren, Feng Li, Hao Zhang, Jie Yang, Chunyuan Li, Jianwei Yang, Hang Su, Jun Zhu, Lei Zhang.
"Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection." ArXiv (2023). [paper] [code] [2023.04]
ImageBind: Rohit Girdhar, Alaaeldin El-Nouby, Zhuang Liu, Mannat Singh, Kalyan Vasudev Alwala, Armand Joulin, Ishan Misra.
"ImageBind: One Embedding Space To Bind Them All." CVPR (2023). [paper] [homepage] [code] [2023.05]
LanguageBind: Bin Zhu, Bin Lin, Munan Ning, Yang Yan, Jiaxi Cui, HongFa Wang, Yatian Pang, Wenhao Jiang, Junwu Zhang, Zongwei Li, Wancai Zhang, Zhifeng Li, Wei Liu, Li Yuan.
"LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment." ArXiv (2023). [paper] [code]
Meta-Transformer: Yiyuan Zhang, Kaixiong Gong, Kaipeng Zhang, Hongsheng Li, Yu Qiao, Wanli Ouyang, Xiangyu Yue.
"Meta-Transformer: A Unified Framework for Multimodal Learning." ArXiv (2023). [paper] [homepage] [code] [中文解读] [2023.07]
OpenSeeD: Hao Zhang, Feng Li, Xueyan Zou, Shilong Liu, Chunyuan Li, Jianfeng Gao, Jianwei Yang, Lei Zhang.
"A Simple Framework for Open-Vocabulary Segmentation and Detection." ICCV (2023). [paper] [code] [2023.03]
RAM: Youcai Zhang, Xinyu Huang, Jinyu Ma, Zhaoyang Li, Zhaochuan Luo, Yanchun Xie, Yuzhuo Qin, Tong Luo, Yaqian Li, Shilong Liu, Yandong Guo, Lei Zhang.
"Recognize Anything: A Strong Image Tagging Model." ArXiv (2023). [paper] [homepage] [code] [2023.06]
PACGen: Yuheng Li, Haotian Liu, Yangming Wen, Yong Jae Lee.
"Generate Anything Anywhere in Any Scene." ArXiv (2023). [paper] [homepage] [code] [2023.06]
ASM: Weiyun Wang, Min Shi, Qingyun Li, Wenhai Wang, Zhenhang Huang, Linjie Xing, Zhe Chen, Hao Li, Xizhou Zhu, Zhiguo Cao, Yushi Chen, Tong Lu, Jifeng Dai, Yu Qiao.
"The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World." ArXiv (2023). [paper] [homepage] [demo] [2023.08]
OneFormer: Jitesh Jain, Jiachen Li, MangTik Chiu, Ali Hassani, Nikita Orlov, Humphrey Shi.
"OneFormer: One Transformer to Rule Universal Image Segmentation." CVPR (2023). [paper] [homepage] [code] [2022.11]
OVSeg: Feng Liang, Bichen Wu, Xiaoliang Dai, Kunpeng Li, Yinan Zhao, Hang Zhang, Peizhao Zhang, Peter Vajda, Diana Marculescu.
"Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP." CVPR (2023). [paper] [homepage] [code] [2022.10]

Follow-up Papers

💥MESA: Yesheng Zhang, Xu Zhao.
"MESA: Matching Everything by Segmenting Anything." ArXiv (2024). [paper] [2024.01]

MixSup: Yuxue Yang, Lue Fan, Zhaoxiang Zhang.
"MixSup: Mixed-grained Supervision for Label-efficient LiDAR-based 3D Object Detection." ICLR (2024). [paper] [code] [2024.01]
GEM: Jing Hao, Moyun Liu, Kuo Feng Hung.
"GEM: Boost Simple Network for Glass Surface Segmentation via Segment Anything Model and Data Synthesis." ArXiv (2024). [paper] [code] [2024.01]
LoRA-SAM: Zehao Ye, Lucy Lovell, Asaad Faramarzi, Jelena Ninic.
"SAM-based instance segmentation models for the automation of masonry crack detection." ArXiv (2024). [paper] [2024.01]
SSR: Yanqi Ge, Ye Huang, Wen Li, Lixin Duan.
"SSR: SAM is a Strong Regularizer for domain adaptive semantic segmentation." ArXiv (2024). [paper] [2024.01]
ScaleFlow: Chengbo Yuan, Chuan Wen, Tong Zhang, Yang Gao.
"General Flow as Foundation Affordance for Scalable Robot Learning." ArXiv (2024). [paper] [code] [2024.01]
HAZARD: Qinhong Zhou, Sunli Chen, Yisong Wang, Haozhe Xu, Weihua Du, Hongxin Zhang, Yilun Du, Joshua B. Tenenbaum, Chuang Gan.
"HAZARD Challenge: Embodied Decision Making in Dynamically Changing Environments." ICLR (2024). [paper] [code] [2024.01]
Laura J. Brooks , Daniel Pearce, Kenton Kwok , Nikhil Jawade , Man Qi, Erola Fenollosa , Deniz Beker, James Whicker, Katrina Davis, Roberto Salguero-G´omez, Robin Wang, and Steve Chappell.
"A video-rate hyperspectral camera for monitoring plant health and biodiversity." ArXiv (2024). [paper] [2024.01]
SAM-OBC: Hu, Yixin and Qi, Zhixin and Zhou, Zhexun and Qin, Yan.
"Detection of Benggang in Remote Sensing Imagery through Integration of Segmentation Anything Model with Object-Based Classification." ArXiv (2024). [paper] [2024.01]
OK-Robot: Peiqi Liu,Yaswanth Orru, Chris Paxton, Nur Muhammad Mahi Shafiullah, Lerrel Pinto.
"OK-Robot: What Really Matters in Integrating Open-Knowledge Models for Robotics." ArXiv (2024). [paper] [code] [2024.01]
Bowei Xue, Han Cheng, Qingqing Yang, Yi Wang, and Xiaoning He.
"Adapting Segment Anything Model to Aerial Land Cover Classification with Low Rank Adaptation." IEEE LGRS (2024). [paper] [2024.01]
Peng Qian,Tomer Ullman.
"Shape Guides Visual Pretense." ArXiv (2024). [paper] [2024.01]
MultiDance-Zero: Zhe Xu, Kun Wei, Xu Yang, Cheng Deng.
"Do You Guys Want to Dance: Zero-Shot Compositional Human Dance Generation with Multiple Persons." ArXiv (2024). [paper] [2024.01]
Vary-toy: Haoran Wei, Lingyu Kong, Jinyue Chen, Liang Zhao, Zheng Ge, En Yu, Jianjian Sun, Chunrui Han, Xiangyu Zhang.
"Small Language Model Meets with Reinforced Vision Vocabulary." ArXiv (2024). [paper] [code] [2024.01]
WildRGB-D: Hongchi Xia, Yang Fu, Sifei Liu, Xiaolong Wang.
"RGBD Objects in the Wild: Scaling Real-World 3D Object Learning from RGB-D Videos." ArXiv (2024). [paper] [code] [2024.01]
Tyche: Marianne Rakic, Hallee E. Wong, Jose Javier Gonzalez Ortiz, Beth Cimini, John Guttag, Adrian V. Dalca.
"Tyche: Stochastic In-Context Learning for Medical Image Segmentation." ArXiv (2024). [paper] [code] [2024.01]
Grounded SAM: Tianhe Ren, Shilong Liu, Ailing Zeng, Jing Lin, Kunchang Li, He Cao, Jiayu Chen, Xinyu Huang, Yukang Chen, Feng Yan, Zhaoyang Zeng, Hao Zhang, Feng Li, Jie Yang, Hongyang Li, Qing Jiang, Lei Zhang.
"Grounded SAM: Assembling Open-World Models for Diverse Visual Tasks." ArXiv (2024). [paper] [code] [2024.01]
TriSAM: Jia Wan, Wanhua Li, Atmadeep Banerjee, Jason Ken Adhinarta, Evelina Sjostedt, Jingpeng Wu, Jeff Lichtman, Hanspeter Pfister, Donglai Wei.
"TriSAM: Tri-Plane SAM for zero-shot cortical blood vessel segmentation in VEM images." ArXiv (2024). [paper] [2024.01]
Kesi Xu, Lea Goetz, Nasir Rajpoot.
"On generalisability of segment anything model for nuclear instance segmentation in histology images." MIUA (2023). [paper] [2024.01]
PA-SAM: Zhaozhi Xie, Bochen Guan, Weihao Jiang, Muyang Yi, Yue Ding, Hongtao Lu, Lei Zhang.
"PA-SAM: Prompt Adapter SAM for High-Quality Image Segmentation." ArXiv (2024). [paper] [code] [2024.01]
SAC: Saiyang Na, Yuzhi Guo, Feng Jiang, Hehuan Ma, Junzhou Huang.
"Segment Any Cell: A SAM-based Auto-prompting Fine-tuning Framework for Nuclei Segmentation." ArXiv (2024). [paper] [2024.01]
ClipSAM: Shengze Li, Jianjian Cao, Peng Ye, Yuhan Ding, Chongjun Tu, Tao Chen.
"ClipSAM: CLIP and SAM Collaboration for Zero-Shot Anomaly Segmentation." ArXiv (2024). [paper] [code] [2024.01]
SegmentAnyBone: Hanxue Gu, Roy Colglazier, Haoyu Dong, Jikai Zhang, Yaqian Chen, Zafer Yildiz, Yuwen Chen, Lin Li, Jichen Yang, Jay Willhite, Alex M. Meyer, Brian Guo, Yashvi Atul Shah, Emily Luo, Shipra Rajput, Sally Kuehn, Clark Bulleit, Kevin A. Wu, Jisoo Lee, Brandon Ramirez, Darui Lu, Jay M. Levin, Maciej A. Mazurowski.
"SegmentAnyBone: A Universal Model that Segments Any Bone at Any Location on MRI." ArXiv (2024). [paper] [code] [2024.01]
Reda Bensaid, Vincent Gripon, François Leduc-Primeau, Lukas Mauch, Ghouthi Boukli Hacene, Fabien Cardinaux.
"A Novel Benchmark for Few-Shot Semantic Segmentation in the Era of Foundation Models." ArXiv (2024). [paper] [2024.01]
Tunnel SAM Adapter: Chen, Junxin and Yu, Xiaojie and Liu, Shichang and Chen, Tao and Wang, Wei and Jeon, Gwanggil and He, Ben-Guo.
"Tunnel SAM Adapter: Adapting Segment Anything Model for Tunnel Water Leakage Inspection." Geohazard Mechanics (2024). [paper] [2024.01]
GEMO: Yinuo Zhao, Kun Wu, Tianjiao Yi, Zhiyuan Xu, Xiaozhu Ju, Zhengping Che, Qinru Qiu, Chi Harold Liu, Jian Tang.
"An Efficient Generalizable Framework for Visuomotor Policies via Control-aware Augmentation and Privilege-guided Distillation." ArXiv (2024). [paper] [2024.01]
Zhan, Youyi and Wang, Tuanfeng Y. and Shao, Tianjia and Zhou, Kun.
"Pattern Guided UV Recovery for Realistic Video Garment Texturing." ArXiv (2024). [paper] [2024.01]
Efficient4D: Zijie Pan, Zeyu Yang, Xiatian Zhu, Li Zhang.
"Fast Dynamic 3D Object Generation from a Single-view Video." ArXiv (2024). [paper] [code] [2024.01]
Hangbin Zheng, Shimin Liu, Hengjun Zhang, Jiayi Yu and Jinsong Bao.
"Visual-triggered contextual guidance for lithium battery disassembly: a multi-modal event knowledge graph approach." ArXiv (2024). [paper] [2024.01]
Chenghao Lu , Emmanuel Nnadozie, Moritz Paul Camenzind, Yuncai Hu and Kang Yu.
"Maize plant detection using UAV-based RGB imaging and YOLOv5." Frontiers in Plant Science (2024). [paper] [2024.01]
OMG-Seg: Xiangtai Li, Haobo Yuan, Wei Li, Henghui Ding, Size Wu, Wenwei Zhang, Yining Li, Kai Chen, Chen Change Loy.
"OMG-Seg: Is One Model Good Enough For All Segmentation?." ArXiv (2024). [paper] [code] [2024.01]
RAP-SAM: Shilin Xu, Haobo Yuan, Qingyu Shi, Lu Qi, Jingbo Wang, Yibo Yang, Yining Li, Kai Chen, Yunhai Tong, Bernard Ghanem, Xiangtai Li, Ming-Hsuan Yang.
"RAP-SAM: Towards Real-Time All-Purpose Segment Anything." ArXiv (2024). [paper] [code] [2024.01]
PRS: Chen-Bin Feng, Qi Lai, Kangdao Liu, Houcheng Su, Chi-Man Vong.
"Boosting Few-Shot Semantic Segmentation Via Segment Anything Model." ArXiv (2024). [paper] [2024.01]
Wenwen Li, Chia-Yu Hsu, Sizhe Wang, Yezhou Yang, Hyunho Lee, Anna Liljedahl, Chandi Witharana, Yili Yang, Brendan M. Rogers, Samantha T. Arundel, Matthew B. Jones, Kenton McHenry, Patricia Solis.
"Segment Anything Model Can Not Segment Anything: Assessing AI Foundation Model's Generalizability in Permafrost Mapping." ArXiv (2024). [paper] [2024.01]
SAM-MCD: Hongruixuan Chen, Jian Song, Naoto Yokoya.
"Change Detection Between Optical Remote Sensing Imagery and Map Data via Segment Anything Model (SAM)." ArXiv (2024). [paper] [2024.01]
GARField: Chung Min Kim, Mingxuan Wu, Justin Kerr, Ken Goldberg, Matthew Tancik, Angjoo Kanazawa.
"GARField: Group Anything with Radiance Fields." ArXiv (2024). [paper] [code] [2024.01]
CPAB: Hexiang Wang, Fengqi Liu, Qianyu Zhou, Ran Yi, Xin Tan, Lizhuang Ma.
"Continuous Piecewise-Affine Based Motion Model for Image Animation." ArXiv (2024). [paper] [code] [2024.01]
SAM4UDASS: Weihao Yan, Yeqiang Qian, Xingyuan Chen, Hanyang Zhuang, Chunxiang Wang, Ming Yang.
"SAM4UDASS: When SAM Meets Unsupervised Domain Adaptive Semantic Segmentation in Intelligent Vehicles." ArXiv (2024). [paper] [code] [2024.01]
Forge_VFM4AD: Xu Yan, Haiming Zhang, Yingjie Cai, Jingming Guo, Weichao Qiu, Bin Gao, Kaiqiang Zhou, Yue Zhao, Huan Jin, Jiantao Gao, Zhen Li, Lihui Jiang, Wei Zhang, Hongbo Zhang, Dengxin Dai, Bingbing Liu.
"Forging Vision Foundation Models for Autonomous Driving: Challenges, Methodologies, and Opportunities." ArXiv (2024). [paper] [code] [2024.01]
Ho Hin Lee, Yu Gu, Theodore Zhao, Yanbo Xu, Jianwei Yang, Naoto Usuyama, Cliff Wong, Mu Wei, Bennett A. Landman, Yuankai Huo, Alberto Santamaria-Pang, Hoifung Poon.
"Foundation Models for Biomedical Image Segmentation: A Survey." ArXiv (2024). [paper] [2024.01]
SAM-OIL: Wenhui Wu, Man Sing Wong, Xinyu Yu, Guoqiang Shi, Coco Yin Tung Kwok, Kang Zou.
"Compositional Oil Spill Detection Based on Object Detector and Adapted Segment Anything Model from SAR Images." ArXiv (2024). [paper] [2024.01]
UV-SAM: Xin Zhang, Yu Liu, Yuming Lin, Qingming Liao, Yong Li.
"UV-SAM: Adapting Segment Anything Model for Urban Village Identification." AAAI (2024). [paper] [code] [2024.01]
“AttEN”: Ching-Hao Chiu, Yu-Jen Chen, Yawen Wu, Yiyu Shi, Tsung-Yi Ho.
"Achieve Fairness without Demographics for Dermatological Disease Diagnosis." ArXiv (2024). [paper] [2024.01]
LandmarkBreaker: Yuezun Li and Pu Sun and Honggang Qi and Siwei Lyu.
"LandmarkBreaker: A proactive method to obstruct DeepFakes via disrupting facial landmark extraction." CVIU (2024). [paper] [2024.01]
GSC: Luis Bolanos, Shih-Yang Su, Helge Rhodin.
"Gaussian Shadow Casting for Neural Characters." ArXiv (2024).
[paper] [2024.01]
Liu, Yue, Tao Sun, Kaixing Wu, Hongwei Zhang, Jingwei Zhang, Xinwen Jiang, Quanwei Lin, and Mei Feng..
"Fractal-Based Pattern Quantification of Mineral Grains: A Case Study of Yichun Rare-Metal Granite." Fractal and Fractional (2024). [paper] [2024.01]
SD-MVS: Zhenlong Yuan, Jiakai Cao, Zhaoxin Li, Hao Jiang, Zhaoqi Wang.
"SD-MVS: Segmentation-Driven Deformation Multi-View Stereo with Spherical Refinement and EM optimization." AAAI (2024). [paper] [2024.01]
SamLP: Haoxuan Ding, Junyu Gao, Yuan Yuan, Qi Wang.
"SamLP: A Customized Segment Anything Model for License Plate Detection." ArXiv (2024). [paper] [code] [2024.01]
RePLan: Marta Skreta, Zihan Zhou, Jia Lin Yuan, Kourosh Darvish, Alán Aspuru-Guzik, Animesh Garg.
"RePLan: Robotic Replanning with Perception and Language Models." ArXiv (2024). [paper] [code] [2024.01]
SOS-SLAM: Jouko Kinnari, Annika Thomas, Parker Lusk, Kota Kondo, Jonathan P. How.
"SOS-SLAM: Segmentation for Open-Set SLAM in Unstructured Environments." ArXiv (2024). [paper] [code] [2024.01]
LRV: Yunhua Zhang, Hazel Doughty, Cees G.M. Snoek.
"Low-Resource Vision Challenges for Foundation Models." ArXiv (2024). [paper] [code] [2024.01]
PartSTAD: Hyunjin Kim, Minhyuk Sung.
"PartSTAD: 2D-to-3D Part Segmentation Task Adaptation." ArXiv (2024). [paper] [2024.01]
MatSAM: Changtai Li, Xu Han, Chao Yao, Xiaojuan Ban.
"MatSAM: Efficient Materials Microstructure Extraction via Visual Large Model." ArXiv (2024). [paper] [2024.01]
Galib Muhammad Shahriar Himel, Md. Masudul Islam, Kh Abdullah Al-Aff, Shams Ibne Karim, Md. Kabir Uddin Sikder.
"Skin Cancer Segmentation and Classification Using Vision Transformer for Automatic Analysis in Dermatoscopy-based Non-invasive Digital System." IJBI (2024). [paper] [2024.01]
SSPrompt: Learning to Prompt Segment Anything Models.
"Learning to Prompt Segment Anything Models." ArXiv (2024). [paper] [2024.01]
SBSM: Zizhang Li, Dor Litvak, Ruining Li, Yunzhi Zhang, Tomas Jakab, Christian Rupprecht, Shangzhe Wu, Andrea Vedaldi, Jiajun Wu.
"Learning the 3D Fauna of the Web." ArXiv (2023). [paper] [code] [2024.01]
DeepBID: Binglin Shen, Chenggui Luo, Wen Pang, Yajing Jiang, Wenbo Wu, Rui Hu, Junle Qu, Bobo Gu, Liwei Liu.
"Surmounting photon limits and motion artifacts for biological dynamics imaging via dual-perspective self-supervised learning." PhotoniX (2024). [paper] [2024.01]
DSALVANet: Jinghui He, Bo Liu, Fan Cao, Jian Xu, Yanshan Xiao.
"Few-Shot Object Counting with Dynamic Similarity-Aware in Latent Space." TGRS (2024). [paper] [code] [2024.01]
Fengtian Lu, Yuzhi Li, Feng Tian.
"Exploring challenge and explainable shot type classification using SAM-guided approaches." SIVP (2024). [paper] [2024.01]
RoboFusion: Ziying Song, Guoxing Zhang, Lin Liu, Lei Yang, Shaoqing Xu, Caiyan Jia, Feiyang Jia, Li Wang.
"RoboFusion: Towards Robust Multi-Modal 3D obiect Detection via SAM." ArXiv (2024). [paper] [2024.01]
SAM4MIS: Yichi Zhang, Zhenrong Shen, Rushi Jiao.
"Segment Anything Model for Medical Image Segmentation: Current Applications and Future Directions." ArXiv (2024). [paper] [code] [2024.01]
OV-SAM: Haobo Yuan, Xiangtai Li, Chong Zhou, Yining Li, Kai Chen, Chen Change Loy.
"Open-Vocabulary SAM: Segment and Recognize Twenty-thousand Classes Interactively." ArXiv (2024). [paper] [project page] [code] [2024.01]
DSR : Yanni Wang, Hecheng Jia, Shilei Fu, Huiping Lin, Feng Xu.
"Reinforcement Learning for SAR View Angle Inversion with Differentiable SAR Renderer." ArXiv (2024). [paper] [2024.01]
Thomas Lips, Victor-Louis De Gusseme, Francis wyffels.
"Learning Keypoints for Robotic Cloth Manipulation using Synthetic Data." ArXiv (2024). [paper] [code] [2024.01]
SwinSAM: Zhoushan Feng, Yuliang Zhanga, Yanhong Chenc, Yu Liua, Wen Sunc , Lili Dua, Dunjin Chen.
"SwinSAM: Fine-Grained Polyp Segmentation in Colonoscopy Images via Segment Anything Model Integrated with a Swin Transformer Decoder." ArXiv (2024). [paper] [2024.01]
BA-SAM: Yiran Song, Qianyu Zhou, Xiangtai Li, Deng-Ping Fan, Xuequan Lu, Lizhuang Ma.
"BA-SAM: Scalable Bias-Mode Attention Mask for Segment Anything Model." ArXiv (2024). [paper] [2024.01]
SAMMed: Hanhui Wang, Huaize Ye, Yi Xia, Xueyan Zhang.
"Leveraging SAM for Single-Source Domain Generalization in Medical Image Segmentation." ArXiv (2024). [paper] [code] [2024.01]
CWSAM: Xinyang Pu, Hecheng Jia, Linghao Zheng, Feng Wang, Feng Xu.
"ClassWise-SAM-Adapter: Parameter Efficient Fine-tuning Adapts Segment Anything to SAR Domain for Semantic Segmentation." ArXiv (2024). [paper] [code] [2024.01]
UCAD: Jiaqi Liu, Kai Wu, Qiang Nie, Ying Chen, Bin-Bin Gao, Yong Liu, Jinbao Wang, Chengjie Wang, Feng Zheng.
"Unsupervised Continual Anomaly Detection with Contrastively-learned Prompt." AAAI (2024). [paper] [code] [2024.01]
TrackGPT: Jiawen Zhu, Zhi-Qi Cheng, Jun-Yan He, Chenyang Li, Bin Luo, Huchuan Lu, Yifeng Geng, Xuansong Xie.
"Tracking with Human-Intent Reasoning." ArXiv (2023). [paper] [code] [2023.12]
Wild2Avatar: Tiange Xiang, Adam Sun, Scott Delp, Kazuki Kozuka, Li Fei-Fei, Ehsan Adeli.
"Wild2Avatar: Rendering Humans Behind Occlusions." ArXiv (2023). [paper] [code] [2023.12]
IS5Net: Xianjie Liu, Keren Fu, Qijun Zhao.
"Promoting Segment Anything Model towards Highly Accurate Dichotomous Image Segmentation." ArXiv (2023). [paper] [2023.12]
DN-SLAM: Chenyu Ruan; Qiuyu Zang; Kehua Zhang; Kai Huang.
"DN-SLAM: A Visual SLAM with ORB Features and NeRF Mapping in Dynamic Environments." IEEE Sensors Journal (2024). [paper] [2023.12]
ZONE: Shanglin Li, Bohan Zeng, Yutang Feng, Sicheng Gao, Xuhui Liu, Jiaming Liu, Li Lin, Xu Tang, Yao Hu, Jianzhuang Liu, Baochang Zhang.
"ZONE: Zero-Shot Instruction-Guided Local Editing." ArXiv (2023). [paper] [2023.12]
Segment3D: Rui Huang, Songyou Peng, Ayca Takmaz, Federico Tombari, Marc Pollefeys, Shiji Song, Gao Huang, Francis Engelmann.
"Segment3D: Learning Fine-Grained Class-Agnostic 3D Segmentation without Manual Labels." ArXiv (2023). [paper] [code] [2023.12]
Unified-IO 2: Jiasen Lu, Christopher Clark, Sangho Lee, Zichen Zhang, Savya Khosla, Ryan Marten, Derek Hoiem, Aniruddha Kembhavi.
"Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action." ArXiv (2023). [paper] [code] [2023.12]
EventSAM: Zhiwen Chen, Zhiyu Zhu, Yifan Zhang, Junhui Hou, Guangming Shi, Jinjian Wu.
"Segment Any Events via Weighted Adaptation of Pivotal Tokens." ArXiv (2023). [paper] [code] [2023.12]
SCM: Xiaoliang Tan, Guanzhou Chen, Tong Wang, Jiaqi Wang, Xiaodong Zhang.
"Segment Change Model (SCM) for Unsupervised Change detection in VHR Remote Sensing Images: a Case Study of Buildings." ArXiv (2023). [paper] [code] [2023.12]
SAM-G: Ziyu Wang, Yanjie Ze, Yifei Sun, Zhecheng Yuan, Huazhe Xu.
"Generalizable Visual Reinforcement Learning with Segment Anything Model." ArXiv (2023). [paper] [code] [2023.12]
SAT-Nano: Ziheng Zhao, Yao Zhang, Chaoyi Wu, Xiaoman Zhang, Ya Zhang, Yanfeng Wang, Weidi Xie.
"One Model to Rule them All: Towards Universal Segmentation for Medical Images with Text Prompts." ArXiv (2023). [paper] [code] [2023.12]
TTP: Keyan Chen, Chengyang Liu, Wenyuan Li, Zili Liu, Hao Chen, Haotian Zhang, Zhengxia Zou, Zhenwei Shi.
"Time Travelling Pixels: Bitemporal Features Integration with Foundation Model for Remote Sensing Image Change Detection." ArXiv (2023). [paper] [code] [2023.12]
UniRef++: Jiannan Wu, Yi Jiang, Bin Yan, Huchuan Lu, Zehuan Yuan, Ping Luo.
"UniRef++: Segment Every Reference Object in Spatial and Temporal Spaces." ArXiv (2023). [paper] [code] [2023.12]
LangSplat: Minghan Qin, Wanhua Li, Jiawei Zhou, Haoqian Wang, Hanspeter Pfister.
"LangSplat: 3D Language Gaussian Splatting." ArXiv (2023). [paper] [code] [2023.12]
HRFFM: Yan Han, Xiaogang Xu, Yingqi Lin, Jiafei Wu, Zhe Liu.
"Video Frame Interpolation with Region-Distinguishable Priors from SAM." ArXiv (2023). [paper] [2023.12]
MSCL: Ruoqing Zhao, Xi Wang, Hongliang Dai, Pan Gao, Piji Li.
"Medical Report Generation based on Segment-Enhanced Contrastive Representation Learning." NLPCC (2023). [paper] [2023.12]
SAPNet: Zhaoyang Wei, Pengfei Chen, Xuehui Yu, Guorong Li, Jianbin Jiao, Zhenjun Han.
"Semantic-aware SAM for Point-Prompted Instance Segmentation." ArXiv (2023). [paper] [2023.12]
Dingkun Guo.
"Learning Multi-Step Manipulation Tasks from A Single Human Demonstration." ArXiv (2023). [paper] [code] [2023.12]
ASSISTGUI: Difei Gao, Lei Ji, Zechen Bai, Mingyu Ouyang, Peiran Li, Dongxing Mao, Qinchen Wu, Weichen Zhang, Peiyi Wang, Xiangwu Guo, Hengxu Wang, Luowei Zhou, Mike Zheng Shou.
"ASSISTGUI: Task-Oriented Desktop Graphical User Interface Automation." ArXiv (2023). [paper] [2023.12]
VStar: Penghao Wu, Saining Xie.
"V∗: Guided Visual Search as a Core Mechanism in Multimodal LLMs." ArXiv (2023). [paper] [code] [2023.12]
FM-OV3D: Dongmei Zhang, Chang Li, Ray Zhang, Shenghao Xie, Wei Xue, Xiaodong Xie, Shanghang Zhang.
"FM-OV3D: Foundation Model-based Cross-modal Knowledge Blending for Open-Vocabulary 3D Detection." AAAI (2024). [paper] [code] [2023.12]
SP-SAM: Wenxi Yue, Jing Zhang, Kun Hu, Qiuxia Wu, Zongyuan Ge, Yong Xia, Jiebo Luo, Zhiyong Wang.
"Part to Whole: Collaborative Prompting for Surgical Instrument Segmentation." ArXiv (2023). [paper] [code] [2023.12]
Customize-It-3D: Nan Huang, Ting Zhang, Yuhui Yuan, Dong Chen, Shanghang Zhang.
"Customize-It-3D: High-Quality 3D Creation from A Single Image Using Subject-Specific Knowledge Prior." ArXiv (2023). [paper] [code] [2023.12]
Ins-HOI: Jiajun Zhang, Yuxiang Zhang, Hongwen Zhang, Boyao Zhou, Ruizhi Shao, Zonghai Hu, Yebin Liu.
"Ins-HOI: Instance Aware Human-Object Interactions Recovery." ArXiv (2023). [paper] [code] [2023.12]
GPS: Zhi Zhang, Qizhe Zhang, Zijun Gao, Renrui Zhang, Ekaterina Shutova, Shiji Zhou, Shanghang Zhang.
"Gradient-based Parameter Selection for Efficient Fine-Tuning." ArXiv (2023). [paper] [2023.12]
PixelLLM: Jiarui Xu, Xingyi Zhou, Shen Yan, Xiuye Gu, Anurag Arnab, Chen Sun, Xiaolong Wang, Cordelia Schmid.
"Pixel Aligned Language Models." ArXiv (2023). [paper] [code] [2023.12]
VectorTalker: Hao Hu, Xuan Wang, Jingxiang Sun, Yanbo Fan, Yu Guo, Caigui Jiang.
"VectorTalker: SVG Talking Face Generation with Progressive Vectorisation." ArXiv (2023). [paper] [2023.12]
Open3DIS: Phuc D.A. Nguyen, Tuan Duc Ngo, Chuang Gan, Evangelos Kalogerakis, Anh Tran, Cuong Pham, Khoi Nguyen.
"Open3DIS: Open-vocabulary 3D Instance Segmentation with 2D Mask Guidance." ArXiv (2023). [paper] [code] [2023.12]
RadOcc: Haiming Zhang, Xu Yan, Dongfeng Bai, Jiantao Gao, Pan Wang, Bingbing Liu, Shuguang Cui, Zhen Li.
"RadOcc: Learning Cross-Modality Occupancy Knowledge through Rendering Assisted Distillation." AAAI (2024). [paper] [2023.12]
CreativeConnect: DaEun Choi, Sumin Hong, Jeongeon Park, John Joon Young Chung, Juho Kim.
"CreativeConnect: Supporting Reference Recombination for Graphic Design Ideation with Generative AI." ArXiv (2023). [paper] [2023.12]
Fei Pan, Sangryul Jeon, Brian Wang, Frank Mckenna, Stella X. Yu.
"Zero-Shot Building Attribute Extraction From Large-Scale Vision and Language Models." WACV (2024). [paper] [2023.12]
MSFM: Shijian Zheng and Rujing Wang and Shitao Zheng and Fenmei Wang and Liusan Wang and Zhigui Liu.
"A Multi-scale feature modulation network for efficient underwater image enhancement." JKSUCI (2023). [paper] [code] [2023.12]
Pranjay Shyam, HyunJin Yoo.
"Lightweight Thermal Super-Resolution and Object Detection for Robust Perception in Adverse Weather Conditions." WACV (2024). [paper] [2023.12]
Weiyi Xie, Nathalie Willems, Shubham Patil, Yang Li, Mayank Kumar.
"SAM Fewshot Finetuning for Anatomical Segmentation in Medical Images." WACV (2024). [paper] [2023.12]
PMVC: Chushan Zhang, Jinguang Tong, Tao Jun Lin, Chuong Nguyen, Hongdong Li.
"PMVC: Promoting Multi-View Consistency for 3D Scene Reconstruction." WACV (2024). [paper] [2023.12]
BBPM: Colbert, Zachery Morton and Arrington, Daniel and Foote, Matthew and Gårding, Jonas and Fay, Dominik and Huo, Michael and Pinkham, Mark and Ramachandran, Prabhakar.
"Repurposing Traditional U-Net Predictions for Sparse SAM Prompting in Medical Image Segmentation." BPEE (2023). [paper] [2023.12]
Hi-Viscont: Weiwei Gu, Anant Sah, Nakul Gopalan.
"Interactive Visual Task Learning for Robots." AAAI (2024). [paper] [2023.12]
Sushil Sharma, Aryan Singh, Ganesh Sistu, Mark Halton, Ciarán Eising.
"Optimizing Ego Vehicle Trajectory Prediction: The Graph Enhancement Approach." EI-AVM (2024). [paper] [2023.12]
Loitering: Johnny Núñez, Zenjie Li, Sergio Escalera, Kamal Nasrollahi.
"Identifying Loitering Behavior with Trajectory Analysis." WACV Workshop (2024). [paper] [code] [2023.12]
Emu2: Quan Sun, Yufeng Cui, Xiaosong Zhang, Fan Zhang, Qiying Yu, Zhengxiong Luo, Yueze Wang, Yongming Rao, Jingjing Liu, Tiejun Huang, Xinlong Wang.
"Generative Multimodal Models are In-Context Learners." ArXiv (2023). [paper] [homepage] [code] [2023.12]
Giorgos Savathrakis, Antonis Argyros.
"An Automated Method for the Creation of Oriented Bounding Boxes in Remote Sensing Ship Detection Datasets." WACV Workshop (2024). [paper] [2023.12]
TinySAM: Han Shu, Wenshuo Li, Yehui Tang, Yiman Zhang, Yihao Chen, Houqiang Li, Yunhe Wang, Xinghao Chen.
"TinySAM: Pushing the Envelope for Efficient Segment Anything Model." ArXiv (2023). [paper] [code] [weights] [2023.12]
SRIN: Haoxing Chen, Yaohui Li, Zhangxuan Gu, Zhuoer Xu, Jun Lan, Huaxiong Li.
"Segment Anything Model Meets Image Harmonization." ICASSP (2024). [paper] [2023.12]
José Guilherme de Almeida, Nuno M. Rodrigues, Sara Silva, Nickolas Papanikolaou.
"Testing the Segment Anything Model on radiology data." ArXiv (2023). [paper] [2023.12]
WSOVOD: Jianghang Lin, Yunhang Shen, Bingquan Wang, Shaohui Lin, Ke Li, Liujuan Cao.
"Weakly Supervised Open-Vocabulary Object Detection." AAAI (2024). [paper] [code] [2023.12]
SAI3D: Yingda Yin, Yuzheng Liu, Yang Xiao, Daniel Cohen-Or, Jingwei Huang, Baoquan Chen.
"SAI3D: Segment Any Instance in 3D Scenes." ArXiv (2023). [paper] [code] [2023.12]
SAMBA: Mohannad Barakat, Noha Magdy, Jjuuko George William, Ethel Phiri, Raymond Confidence, Dong Zhang, Udunna C Anazodo.
"Towards SAMBA: Segment Anything Model for Brain Tumor Segmentation in Sub-Sharan African Populations." ArXiv (2023). [paper] [2023.12]
EVI-SAM: Weipeng Guan, Peiyu Chen, Huibin Zhao, Yu Wang, Peng Lu.
"EVI-SAM: Robust, Real-time, Tightly-coupled Event-Visual-Inertial State Estimation and 3D Dense Mapping." ArXiv (2023). [paper] [2023.12]
GSVA: Zhuofan Xia, Dongchen Han, Yizeng Han, Xuran Pan, Shiji Song, Gao Huang.
"GSVA: Generalized Segmentation via Multimodal Large Language Models." ArXiv (2023). [paper] [code] [2023.12]
ABR: Junyu Xie, Weidi Xie, Andrew Zisserman.
"Appearance-based Refinement for Object-Centric Motion Segmentation." ArXiv (2023). [paper] [2023.12]
Yixin Zhang, Shen Zhao, Hanxue Gu, Maciej A. Mazurowski.
"How to Efficiently Annotate Images for Best-Performing Deep Learning Based Segmentation Models: An Empirical Study with Weak and Noisy Annotations and Segment Anything Model." ArXiv (2023). [paper] [2023.12]
Isabelle Tingzon, Nuala Margaret Cowan, Pierre Chrzanowski.
"Mapping Housing Stock Characteristics from Drone Images for Climate Resilience in the Caribbean." NeurIPS Workshop (2023). [paper] [2023.12]
TIFace: Ruijie Zhu, Jiahao Chang, Ziyang Song, Jiahuan Yu, Tianzhu Zhang.
"TIFace: Improving Facial Reconstruction through Tensorial Radiance Fields and Implicit Surfaces." ICCV Workshop (2023). [paper] [code] [2023.12]
Osprey: Yuqian Yuan, Wentong Li, Jian Liu, Dongqi Tang, Xinjie Luo, Chi Qin, Lei Zhang, Jianke Zhu.
"Osprey: Pixel Understanding with Visual Instruction Tuning." ArXiv (2023). [paper] [code] [2023.12]
SQA-SAM: Yizhe Zhang, Shuo Wang, Tao Zhou, Qi Dou, Danny Z. Chen.
"SQA-SAM: Segmentation Quality Assessment for Medical Images Utilizing the Segment Anything Model." ArXiv (2023). [paper] [code] [2023.12]
CLOUDS: Yasser Benigmim, Subhankar Roy, Slim Essid, Vicky Kalogeiton, Stéphane Lathuilière.
"Collaborating Foundation models for Domain Generalized Semantic Segmentation." ArXiv (2023). [paper] [code] [2023.12]
MobileSAMv2: Chaoning Zhang, Dongshen Han, Sheng Zheng, Jinwoo Choi, Tae-Ho Kim, Choong Seon Hong.
"MobileSAMv2: Faster Segment Anything to Everything." ArXiv (2023). [paper] [code] [2023.12]
Alpha-CLIP: Zeyi Sun, Ye Fang, Tong Wu, Pan Zhang, Yuhang Zang, Shu Kong, Yuanjun Xiong, Dahua Lin, Jiaqi Wang.
"Alpha-CLIP: A CLIP Model Focusing on Wherever You Want." ArXiv (2023). [paper] [homepage] [code] [2023.12]
WonderJourney: Hong-Xing Yu, Haoyi Duan, Junhwa Hur, Kyle Sargent, Michael Rubinstein, William T. Freeman, Forrester Cole, Deqing Sun, Noah Snavely, Jiajun Wu, Charles Herrmann.
"WonderJourney: Going from Anywhere to Everywhere." ArXiv (2023). [paper] [code] [2023.12]
MobileSAM-Track: Liu, Yehui, Yuliang Zhao, Xinyue Zhang, Xiaoai Wang, Chao Lian, Jian Li, Peng Shan, Changzeng Fu, Xiaoyong Lyu, Lianjiang Li, and et al.
"MobileSAM-Track: Lightweight One-Shot Tracking and Segmentation of Small Objects on Edge Devices." Remote Sensing (2023). [paper] [2023.12]
IT3DEgo: Yunhan Zhao, Haoyu Ma, Shu Kong, Charless Fowlkes.
"Instance Tracking in 3D Scenes from Egocentric Videos." ArXiv (2023). [paper] [code] [2023.12]
Rein: Zhixiang Wei, Lin Chen, Yi Jin, Xiaoxiao Ma, Tianle Liu, Pengyang Lin, Ben Wang, Huaian Chen, Jinjin Zheng.
"Stronger, Fewer, & Superior: Harnessing Vision Foundation Models for Domain Generalized Semantic Segmentation." ArXiv (2023). [paper] [code] [2023.12]
MR.HARM: Hongzhan Lin, Ziyang Luo, Jing Ma, Long Chen.
"Beneath the Surface: Unveiling Harmful Memes with Multimodal Reasoning Distilled from Large Language Models." EMNLP (2023). [paper] [2023.12]
ControlRoom3D: Jonas Schult, Sam Tsai, Lukas Höllein, Bichen Wu, Jialiang Wang, Chih-Yao Ma, Kunpeng Li, Xiaofang Wang, Felix Wimbauer, Zijian He, Peizhao Zhang, Bastian Leibe, Peter Vajda, Ji Hou.
"ControlRoom3D: Room Generation using Semantic Proxy Rooms." ArXiv (2023). [paper] [code] [2023.12]
SmartEdit: Yuzhou Huang, Liangbin Xie, Xintao Wang, Ziyang Yuan, Xiaodong Cun, Yixiao Ge, Jiantao Zhou, Chao Dong, Rui Huang, Ruimao Zhang, Ying Shan.
"SmartEdit: Exploring Complex Instruction-based Image Editing with Multimodal Large Language Models." ArXiv (2023). [paper] [code] [2023.12]
COMBO: Qi Yang, Xing Nie, Tong Li, Pengfei Gao, Ying Guo, Cheng Zhen, Pengfei Yan, Shiming Xiang.
"Cooperation Does Matter: Exploring Multi-Order Bilateral Relations for Audio-Visual Segmentation." ArXiv (2023). [paper] [code] [2023.12]
Vary: Haoran Wei, Lingyu Kong, Jinyue Chen, Liang Zhao, Zheng Ge, Jinrong Yang, Jianjian Sun, Chunrui Han, Xiangyu Zhang.
"Vary: Scaling up the Vision Vocabulary for Large Vision-Language Models." ArXiv (2023). [paper] [code] [2023.12]
Annolid: Chen Yang, Jeremy Forest, Matthew Einhorn, Thomas A. Cleland.
"Automated Behavioral Analysis Using Instance Segmentation." ArXiv (2023). [paper] [code] [2023.12]
FIND: Xueyan Zou, Linjie Li, Jianfeng Wang, Jianwei Yang, Mingyu Ding, Zhengyuan Yang, Feng Li, Hao Zhang, Shilong Liu, Arul Aravinthan, Yong Jae Lee, Lijuan Wang.
"Interfacing Foundation Models’ Embeddings." ArXiv (2023). [paper] [code] [2023.12]
ScribblePrompt: Hallee E. Wong, Marianne Rakic, John Guttag, Adrian V. Dalca.
"ScribblePrompt: Fast and Flexible Interactive Segmentation for Any Medical Image." ArXiv (2023). [paper] [code] [2023.12]
MWSIS: Guangfeng Jiang, Jun Liu, Yuzhi Wu, Wenlong Liao, Tao He, Pai Peng.
"MWSIS: Multimodal Weakly Supervised Instance Segmentation with 2D Box Annotations for Autonomous Driving." ArXiv (2023). [paper] [code] [2023.12]
Mask as Supervision: Yuchen Yang, Yu Qiao, Xiao Sun.
"Mask as Supervision: Leveraging Unified Mask Information for Unsupervised 3D Pose Estimation." ArXiv (2023). [paper] [code] [2023.12]
IPSL: Yujun Chen, Xin Tan, Zhizhong Zhang, Yanyun Qu, Yuan Xie.
"Beyond the Label Itself: Latent Labels Enhance Semi-supervised Point Cloud Panoptic Segmentation." ArXiv (2023). [paper] [2023.12]
SESAME: Tsung-Han Wu, Giscard Biamby, David Chan, Lisa Dunlap, Ritwik Gupta, Xudong Wang, Joseph E. Gonzalez, Trevor Darrell.
"See, Say, and Segment: Teaching LMMs to Overcome False Premises." ArXiv (2023). [paper] [code] [2023.12]
RefCOCOm: Wenxuan Wang, Tongtian Yue, Yisi Zhang, Longteng Guo, Xingjian He, Xinlong Wang, Jing Liu.
"Unveiling Parts Beyond Objects:Towards Finer-Granularity Referring Expression Segmentation." ArXiv (2023). [paper] [code] [2023.12]
TAP: ing Pan, Lulu Tang, Xinlong Wang, Shiguang Shan.
"Tokenize Anything via Prompting." ArXiv (2023). [paper] [code] [2023.12]
Josh Stein, Maxime Di Folco, Julia A. Schnabel.
"Influence of Prompting Strategies on Segment Anything Model (SAM) for Short-axis Cardiac MRI segmentation." ArXiv (2023). [paper] [2023.12]
SAM-Graph: Haoyu Guo, He Zhu, Sida Peng, Yuang Wang, Yujun Shen, Ruizhen Hu, Xiaowei Zhou.
"SAM-guided Graph Cut for 3D Instance Segmentation." ArXiv (2023). [paper] [code] [2023.12]
ASLseg: Shiyun Chen, Li Lin, Pujin Cheng, Xiaoying Tang.
"ASLseg: Adapting SAM in the Loop for Semi-supervised Liver Tumor Segmentation." ArXiv (2023). [paper] [2023.12]
GenSAM: Jian Hu, Jiayi Lin, Weitong Cai, Shaogang Gong.
"Relax Image-Specific Prompt Requirement in SAM: A Single Generic Prompt for Segmenting Camouflaged Objects." AAAI (2024). [paper] [code] [2023.12]
AM-RADIO: Mike Ranzinger, Greg Heinrich, Jan Kautz, Pavlo Molchanov.
"AM-RADIO: Agglomerative Model – Reduce All Domains Into One." ArXiv (2023). [paper] [code] [2023.12]
SqueezeSAM: Balakrishnan Varadarajan, Bilge Soran, Forrest Iandola, Xiaoyu Xiang, Yunyang Xiong, Chenchen Zhu, Raghuraman Krishnamoorthi, Vikas Chandra.
"SqueezeSAM: User friendly mobile interactive segmentation." ArXiv (2023). [paper] [2023.12]
EdgeSAM: Chong Zhou, Xiangtai Li, Chen Change Loy, Bo Dai.
"EdgeSAM: Prompt-In-the-Loop Distillation for On-Device Deployment of SAM." ArXiv (2023). [paper] [code] [2023.12]
SeCo: Dong Zhao, Ruizhi Yang, Shuang Wang, Qi Zang, Yang Hu, Licheng Jiao, Nicu Sebe, Zhun Zhong.
"Semantic Connectivity-Driven Pseudo-labeling for Cross-domain Segmentation." ArXiv (2023). [paper] [code] [2023.12]
SemiSAM: Yichi Zhang, Yuan Cheng, Yuan Qi.
"SemiSAM: Exploring SAM for Enhancing Semi-Supervised Medical Image Segmentation with Extremely Limited Annotations." ArXiv (2023). [paper] [2023.12]
RepViT-SAM: Ao Wang, Hui Chen, Zijia Lin, Jungong Han, Guiguang Ding.
"RepViT-SAM: Towards Real-Time Segmenting Anything." ArXiv (2023). [paper] [code] [2023.12]
SlimSAM: Zigeng Chen, Gongfan Fang, Xinyin Ma, Xinchao Wang.
"0.1% Data Makes Segment Anything Slim." ArXiv (2023). [paper] [code] [2023.12]
CAR: Fangzhou Song, Bin Zhu, Yanbin Hao, Shuo Wang, Xiangnan He.
"CAR: Consolidation, Augmentation and Regulation for Recipe Retrieval." ArXiv (2023). [paper] [2023.12]
HOLD: Zicong Fan, Maria Parelli, Maria Eleni Kadoglou, Muhammed Kocabas, Xu Chen, Michael J. Black, Otmar Hilliges.
"HOLD: Category-agnostic 3D Reconstruction of Interacting Hands and Objects from Video." ArXiv (2023). [paper] [code] [2023.12]
ViP-LLaVA: Mu Cai, Haotian Liu, Siva Karthik Mustikovela, Gregory P. Meyer, Yuning Chai, Dennis Park, Yong Jae Lee.
"Making Large Multimodal Models Understand Arbitrary Visual Prompts." ArXiv (2023). [paper] [code] [2023.12]
SPEC: Wujian Peng, Sicheng Xie, Zuyao You, Shiyi Lan, Zuxuan Wu.
"Synthesize, Diagnose, and Optimize: Towards Fine-Grained Vision-Language Understanding." ArXiv (2023). [paper] [code] [2023.12]
HiFi Tuner: Zhonghao Wang, Wei Wei, Yang Zhao, Zhisheng Xiao, Mark Hasegawa-Johnson, Humphrey Shi, Tingbo Hou.
"HiFi Tuner: High-Fidelity Subject-Driven Fine-Tuning for Diffusion Models." ArXiv (2023). [paper] [2023.12]
TrafficMOT: Lihao Liu, Yanqi Cheng, Zhongying Deng, Shujun Wang, Dongdong Chen, Xiaowei Hu, Pietro Liò, Carola-Bibiane Schönlieb, Angelica Aviles-Rivero.
"TrafficMOT: A Challenging Dataset for Multi-Object Tracking in Complex Traffic Scenarios." ArXiv (2023). [paper] [2023.12]
VideoBooth: Yuming Jiang, Tianxing Wu, Shuai Yang, Chenyang Si, Dahua Lin, Yu Qiao, Chen Change Loy, Ziwei Liu.
"VideoBooth: Diffusion-based Video Generation with Image Prompts." ArXiv (2023). [paper] [code] [2023.12]
Portrait Diffusion: Jin Liu, Huaibo Huang, Chao Jin, Ran He.
"Portrait Diffusion: Training-free Face Stylization with Chain-of-Painting." ArXiv (2023). [paper] [code] [2023.12]
NPGs: Devikalyan Das, Christopher Wewer, Raza Yunus, Eddy Ilg, Jan Eric Lenssen.
"Neural Parametric Gaussians for Monocular Non-Rigid Object Reconstruction." ArXiv (2023). [paper] [2023.12]
Diffusion Handles: Karran Pandey, Paul Guerrero, Matheus Gadelha, Yannick Hold-Geoffroy, Karan Singh, Niloy Mitra.
"Diffusion Handles: Enabling 3D Edits for Diffusion Models by Lifting Activations to 3D." ArXiv (2023). [paper] [code] [2023.12]
VLTSeg: Christoph Hümmer, Manuel Schwonberg, Liangwei Zhong, Hu Cao, Alois Knoll, Hanno Gottschalk.
"VLTSeg: Simple Transfer of CLIP-Based Vision-Language Representations for Domain Generalized Semantic Segmentation." ArXiv (2023). [paper] [2023.12]
CustomNeRF: Runze He, Shaofei Huang, Xuecheng Nie, Tianrui Hui, Luoqi Liu, Jiao Dai, Jizhong Han, Guanbin Li, Si Liu.
"Customize your NeRF: Adaptive Source Driven 3D Scene Editing via Local-Global Iterative Training." ArXiv (2023). [paper] [code] [2023.12]
Yilin Ye, Qian Zhu, Shishi Xiao, Kang Zhang, Wei Zeng.
"The Contemporary Art of Image Search: Iterative User Intent Expansion via Vision-Language Model." CSCW (2024). [paper] [2023.12]
StoryGPT-V: Xiaoqian Shen, Mohamed Elhoseiny.
"Large Language Models as Consistent Story Visualizers." ArXiv (2023). [paper] [code] [2023.12]
PixelLM: Zhongwei Ren, Zhicheng Huang, Yunchao Wei, Yao Zhao, Dongmei Fu, Jiashi Feng, Xiaojie Jin.
"PixelLM: Pixel Reasoning with Large Multimodal Model." ArXiv (2023). [paper] [code] [2023.12]
SAGE: Haoran Geng, Songlin Wei, Congyue Deng, Bokui Shen, He Wang, Leonidas Guibas.
"SAGE: Bridging Semantic and Actionable Parts for GEneralizable Articulated-Object Manipulation under Language Instructions." ArXiv (2023). [paper] [code] [2023.12]
TranSegPGD: Xiaojun Jia, Jindong Gu, Yihao Huang, Simeng Qin, Qing Guo, Yang Liu, Xiaochun Cao.
"TranSegPGD: Improving Transferability of Adversarial Examples on Semantic Segmentation." ArXiv (2023). [paper] [2023.12]
MANUS: Chandradeep Pokhariya, Ishaan N Shah, Angela Xing, Zekun Li, Kefan Chen, Avinash Sharma, Srinath Sridhar.
"MANUS: Markerless Hand-Object Grasp Capture using Articulated 3D Gaussians." ArXiv (2023). [paper] [code] [2023.12]
Yao-Chih Lee, Zhoutong Zhang, Kevin Blackburn-Matzen, Simon Niklaus, Jianming Zhang, Jia-Bin Huang, Feng Liu.
"Fast View Synthesis of Casual Videos." ArXiv (2023). [paper] [code] [2023.12]
UniLSeg: Yong Liu, Cairong Zhang, Yitong Wang, Jiahao Wang, Yujiu Yang, Yansong Tang.
"Universal Segmentation at Arbitrary Granularity with Language Instruction." ArXiv (2023). [paper] [code] [2023.12]
LooseControl: Shariq Farooq Bhat, Niloy J. Mitra, Peter Wonka.
"LooseControl: Lifting ControlNet for Generalized Depth Conditioning." ArXiv (2023). [paper] [code] [2023.12]
Yankun Wu, Yuta Nakashima, Noa Garcia.
"Stable Diffusion Exposed: Gender Bias from Prompt to Image." ArXiv (2023). [paper] [2023.12]
Drag-A-Video: Yao Teng, Enze Xie, Yue Wu, Haoyu Han, Zhenguo Li, Xihui Liu.
"Drag-A-Video: Non-rigid Video Editing with Point-based Interaction." ArXiv (2023). [paper] [code] [2023.12]
SAVE: Yeji Song, Wonsik Shin, Junsoo Lee, Jeesoo Kim, Nojun Kwak.
"SAVE: Protagonist Diversification with Structure Agnostic Video Editing." ArXiv (2023). [paper] [code] [2023.12]
RA-SRGT: Mengke Song, Linfeng Li, Dunquan Wu, Wenfeng Song, Chenglizhao Chen.
"Rethinking Object Saliency Ranking: A Novel Whole-flow Processing Paradigm." IEEE TIP (2023). [paper] [code] [2023.12]
TokenCompose: Zirui Wang, Zhizhou Sha, Zheng Ding, Yilin Wang, Zhuowen Tu.
"TokenCompose: Grounding Diffusion with Token-level Supervision." ArXiv (2023). [paper] [code] [2023.12]
FoodFusion: Olivia Markham, Yuhao Chen, Chi-en Amy Tai, Alexander Wong.
"FoodFusion: A Latent Diffusion Model for Realistic Food Image Generation." ArXiv (2023). [paper] [code] [2023.12]
CrackSAM: Kang Ge, Chen Wang, Yutao Guo.
"Fine-tune vision foundation model for crack segmentation in civil infrastructures." ArXiv (2023). [paper] [2023.12]
SAMBA: Ronan Docherty, Isaac Squires, Antonis Vamvakeros, Samuel J. Cooper.
"SAMBA: A Trainable Segmentation Web-App with Smart Labelling." ArXiv (2023). [paper] [code] [2023.12]
Israt Zarin Era, Imtiaz Ahmed, Zhichao Liu, Srinjoy Das.
"An unsupervised approach towards promptable defect segmentation in laser-based additive manufacturing by Segment Anything." ArXiv (2023). [paper] [2023.12]
PartSLIP++: Yuchen Zhou, Jiayuan Gu, Xuanlin Li, Minghua Liu, Yunhao Fang, Hao Su.
"PartSLIP++: Enhancing Low-Shot 3D Part Segmentation via Multi-View Instance Segmentation and Maximum Likelihood Estimation." ArXiv (2023). [paper] [code] [2023.12]
Sambor: Xumeng Han, Longhui Wei, Xuehui Yu, Zhiyang Dou, Xin He, Kuiran Wang, Zhenjun Han, Qi Tian.
"Boosting Segment Anything Model Towards Open-Vocabulary Learning." ArXiv (2023). [paper] [code] [2023.12]
SAMS: Xiaobo Yang, Xiaojin Gong.
"Foundation Model Assisted Weakly Supervised Semantic Segmentation." ArXiv (2023). [paper] [2023.12]
WeSAM: Haojie Zhang, Yongyi Su, Xun Xu, Kui Jia.
"Improving the Generalization of Segmentation Foundation Model under Distribution Shift via Weakly Supervised Adaptation." ArXiv (2023). [paper] [code] [2023.12]
Feature 3DGS: Shijie Zhou, Haoran Chang, Sicheng Jiang, Zhiwen Fan, Zehao Zhu, Dejia Xu, Pradyumna Chari, Suya You, Zhangyang Wang, Achuta Kadambi.
"Feature 3DGS: Supercharging 3D Gaussian Splatting to Enable Distilled Feature Fields." ArXiv (2023). [paper] [code] [2023.12]
AI-SAM: Yimu Pan, Sitao Zhang, Alison D. Gernand, Jeffery A. Goldstein, James Z. Wang.
"AI-SAM: Automatic and Interactive Segment Anything Model." ArXiv (2023). [paper] [code] [2023.12]
SSRS: Xianping Ma, Qianqian Wu, Xingyu Zhao, Xiaokang Zhang, Man-On Pun, Bo Huang.
"SAM-Assisted Remote Sensing Imagery Semantic Segmentation with Object and Boundary Constraints." ArXiv (2023). [paper] [code] [2023.12]
GranSAM: Rohit Kundu, Sudipta Paul, Rohit Lal, Amit K. Roy-Chowdhury.
"Towards Granularity-adjusted Pixel-level Semantic Annotation." ArXiv (2023). [paper] [2023.12]
SAGA: Jiazhong Cen, Jiemin Fang, Chen Yang, Lingxi Xie, Xiaopeng Zhang, Wei Shen, Qi Tian.
"Segment Any 3D Gaussians." ArXiv (2023). [paper] [homepage] [2023.12]
APE: Yunhang Shen, Chaoyou Fu, Peixian Chen, Mengdan Zhang, Ke Li, Xing Sun, Yunsheng Wu, Shaohui Lin, Rongrong Ji.
"Aligning and Prompting Everything All at Once for Universal Visual Perception." ArXiv (2023). [paper] [code] [2023.12]
SANeRF-HQ: Yichen Liu, Benran Hu, Chi-Keung Tang, Yu-Wing Tai.
"SANeRF-HQ: Segment Anything for NeRF in High Quality." ArXiv (2023). [paper] [code] [2023.12]
ObjectChangeDetection: Aikaterini Adam, Konstantinos Karantzalos, Lazaros Grammatikopoulos, Torsten Sattler.
"Has Anything Changed? 3D Change Detection by 2D Segmentation Masks." ArXiv (2023). [paper] [code] [2023.12]
SCA: Xiaoke Huang, Jianfeng Wang, Yansong Tang, Zheng Zhang, Han Hu, Jiwen Lu, Lijuan Wang, Zicheng Liu.
"Segment and Caption Anything." ArXiv (2023). [paper] [code] [2023.12]
EfficientSAM: Yunyang Xiong, Bala Varadarajan, Lemeng Wu, Xiaoyu Xiang, Fanyi Xiao, Chenchen Zhu, Xiaoliang Dai, Dilin Wang, Fei Sun, Forrest Iandola, Raghuraman Krishnamoorthi, Vikas Chandra.
"EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anything." ArXiv (2023). [paper] [2023.12]
U-BDD++: Yiyun Zhang, Zijian Wang, Yadan Luo, Xin Yu, Zi Huang.
"Learning Efficient Unsupervised Satellite Image-based Building Damage Detection." ArXiv (2023). [paper] [code] [2023.12]
Gaussian Grouping: Mingqiao Ye, Martin Danelljan, Fisher Yu, Lei Ke.
"Gaussian Grouping: Segment and Edit Anything in 3D Scenes." ArXiv (2023). [paper] [code] [2023.12]
SAM-CLNet: Yiming Zhao, Tao Zhou, Yunqi Gu, Yi Zhou, Yizhe Zhang, Ye Wu, Huazhu Fu.
"Segment Anything Model-guided Collaborative Learning Network for Scribble-supervised Polyp Segmentation." ArXiv (2023). [paper] [2023.12]
S2M: Wenjie Zhao, Jia Li, Xin Dong, Yu Xiang, Yunhui Guo.
"Segment Every Out-of-Distribution Object." ArXiv (2023). [paper] [2023.11]
ZeroPS: Yuheng Xue, Nenglun Chen, Jun Liu, Wenyun Sun.
"ZeroPS: High-quality Cross-modal Knowledge Transfer for Zero-Shot 3D Part Segmentation." ArXiv (2023). [paper] [2023.11]
GigaPose: Van Nguyen Nguyen, Thibault Groueix, Mathieu Salzmann, Vincent Lepetit.
"GigaPose: Fast and Robust Novel Object Pose Estimation via One Correspondence." ArXiv (2023). [paper] [code] [2023.11]
ToddlerDiffusion: Eslam Mohamed Bakr, Liangbing Zhao, Vincent Tao Hu, Matthieu Cord, Patrick Perez, Mohamed Elhoseiny.
"ToddlerDiffusion: Flash Interpretable Controllable Diffusion Model." ArXiv (2023). [paper] [2023.11]
GaussianEditor: Yiwen Chen, Zilong Chen, Chi Zhang, Feng Wang, Xiaofeng Yang, Yikai Wang, Zhongang Cai, Lei Yang, Huaping Liu, Guosheng Lin.
"GaussianEditor: Swift and Controllable 3D Editing with Gaussian Splatting." ArXiv (2023). [paper] [code] [2023.11]
SEGIC: Lingchen Meng, Shiyi Lan, Hengduo Li, Jose M. Alvarez, Zuxuan Wu, Yu-Gang Jiang.
"SEGIC: Unleashing the Emergent Correspondence for In-Context Segmentation." ArXiv (2023). [paper] [code] [2023.11]
Nguyen, Le Quan, Jihye Shin, Sanghuyn Ryu, L. Minh Dang, Han Yong Park, O New Lee, and Hyeonjoon Moon.
"Innovative Cucumber Phenotyping: A Smartphone-Based and Data-Labeling-Free Model." ArXiv (2023). [paper] [2023.11]
Nicholas Lui, Bryan Chia, William Berrios, Candace Ross, Douwe Kiela.
"Leveraging Diffusion Perturbations for Measuring Fairness in Computer Vision." ArXiv (2023). [paper] [2023.11]
RO-LLaMA: Kwanyoung Kim, Yujin Oh, Sangjoon Park, Hwa Kyung Byun, Jin Sung Kim, Yong Bae Kim, Jong Chul Ye.
"RO-LLaMA: Generalist LLM for Radiation Oncology via Noise Augmentation and Consistency Regularization." ArXiv (2023). [paper] [2023.11]
SiTH: Hsuan-I Ho, Jie Song, Otmar Hilliges.
"SiTH: Single-view Textured Human Reconstruction with Image-Conditioned Diffusion." ArXiv (2023). [paper] [code] [2023.11]
GaussianEditor: Jiemin Fang, Junjie Wang, Xiaopeng Zhang, Lingxi Xie, Qi Tian.
"GaussianEditor: Editing 3D Gaussians Delicately with Text Instructions." ArXiv (2023). [paper] [homepage] [2023.11]
VLPrompt: Zijian Zhou, Miaojing Shi, Holger Caesar.
"VLPrompt: Vision-Language Prompting for Panoptic Scene Graph Generation." ArXiv (2023). [paper] [2023.11]
MotionZero: Sitong Su, Litao Guo, Lianli Gao, Hengtao Shen, Jingkuan Song.
"MotionZero:Exploiting Motion Priors for Zero-shot Text-to-Video Generation." ArXiv (2023). [paper] [2023.11]
SEED-Bench-2: Bohao Li, Yuying Ge, Yixiao Ge, Guangzhi Wang, Rui Wang, Ruimao Zhang, Ying Shan.
"SEED-Bench-2: Benchmarking Multimodal Large Language Models." ArXiv (2023). [paper] [code] [2023.11]
MLKG: Shupeng Cheng, Ge-Peng Ji, Pengda Qin, Deng-Ping Fan, Bowen Zhou, Peng Xu.
"Large Model Based Referring Camouflaged Object Detection." ArXiv (2023). [paper] [2023.11]
APAP: Seungwoo Yoo, Kunho Kim, Vladimir G. Kim, Minhyuk Sung.
"As-Plausible-As-Possible: Plausibility-Aware Mesh Deformation Using 2D Diffusion Priors." ArXiv (2023). [paper] [homepage] [2023.11]
ROSO: Yusuke Miyashita, Dimitris Gahtidis, Colin La, Jeremy Rabinowicz, Jurgen Leitner.
"ROSO: Improving Robotic Policy Inference via Synthetic Observations." ACRA (2023). [paper] [code] [2023.11]
Exo2EgoDVC: Takehiko Ohkawa, Takuma Yagi, Taichi Nishimura, Ryosuke Furuta, Atsushi Hashimoto, Yoshitaka Ushiku, Yoichi Sato.
"Exo2EgoDVC: Dense Video Captioning of Egocentric Procedural Activities Using Web Instructional Videos." ArXiv (2023). [paper] [2023.11]
ESAM: Chengwen Zhang, Yingwei Zhao.
"Efficient SAM for Medical Image Analysis." ArXiv (2023). [paper] [2023.11]
MMA-Diffusion: Yijun Yang, Ruiyuan Gao, Xiaosen Wang, Nan Xu, Qiang Xu.
"MMA-Diffusion: MultiModal Attack on Diffusion Models." ArXiv (2023). [paper] [2023.11]
LLM-State: Siwei Chen, Anxing Xiao, David Hsu.
"LLM-State: Expandable State Representation for Long-horizon Task Planning in the Open World." ArXiv (2023). [paper] [2023.11]
Narendra Dev, J. John Soundar Jerome, Hélène Scolan, Jean-Philippe Matas.
"Liquid inertia versus bubble cloud buoyancy in circular plunging jet experiments." ArXiv (2023). [paper] [2023.11]
HUGS: Muhammed Kocabas, Jen-Hao Rick Chang, James Gabriel, Oncel Tuzel, Anurag Ranjan.
"HUGS: Human Gaussian Splats." ArXiv (2023). [paper] [code] [2023.11]
SAM-ILP: Aayush Kumar Tyagi, Vaibhav Mishra, Prathosh A. P., Mausam.
"Guided Prompting in SAM for Weakly Supervised Cell Segmentation in Histopathological Images." ArXiv (2023). [paper] [code] [2023.11]
SAMPro3D: Mutian Xu, Xingyilang Yin, Lingteng Qiu, Yang Liu, Xin Tong, Xiaoguang Han.
"SAMPro3D: Locating SAM Prompts in 3D for Zero-Shot Scene Segmentation." ArXiv (2023). [paper] [code] [2023.11]
SAM-COBOT: Zelin Peng, Zhengqin Xu, Zhilin Zeng, Lingxi Xie, Qi Tian, Wei Shen.
"Parameter Efficient Fine-tuning via Cross Block Orchestration for Segment Anything Model." ArXiv (2023). [paper] [2023.11]
SemReID: Siyuan Huang, Yifan Zhou, Ram Prabhakar Kathirvel, Rama Chellappa, Chun Pong Lau.
"Self-Supervised Learning of Whole and Component-Based Semantic Representations for Person Re-Identification." ArXiv (2023). [paper] [2023.11]
I-MedSAM: Xiaobao Wei, Jiajun Cao, Yizhu Jin, Ming Lu, Guangyu Wang, Shanghang Zhang.
"I-MedSAM: Implicit Medical Image Segmentation with Segment Anything." ArXiv (2023). [paper] [2023.11]
RAH-Bench: Zhiyang Chen, Yousong Zhu, Yufei Zhan, Zhaowen Li, Chaoyang Zhao, Jinqiao Wang, Ming Tang.
"Mitigating Hallucination in Visual Language Models with Visual Supervision." ArXiv (2023). [paper] [2023.11]
Fei He, Zhiyuan Yang, Mingyue Gao, Biplab Poudel, Newgin Sam Ebin Sam Dhas, Rajan Gyawali, Ashwin Dhakal, Jianlin Cheng, Dong Xu.
"Adapting Segment Anything Model (SAM) through Prompt-based Learning for Enhanced Protein Identification in Cryo-EM Micrographs." ArXiv (2023). [paper] [2023.11]
Obj-NeRF: Zhiyi Li, Lihe Ding, Tianfan Xue.
"Obj-NeRF: Extract Object NeRFs from Multi-view Images." ArXiv (2023). [paper] [code] [2023.11]
Ming Li, Guang Yang.
"Where to Begin? From Random to Foundation Model Instructed Initialization in Federated Learning for Medical Image Segmentation." ArXiv (2023). [paper] [2023.11]
SAM-6D: Jiehong Lin, Lihua Liu, Dekun Lu, Kui Jia.
"SAM-6D: Segment Anything Model Meets Zero-Shot 6D Object Pose Estimation." ArXiv (2023). [paper] [code] [2023.11]
MARIS: Mengxi Zhang, Yiming Liu, Xiangjun Yin, Huanjing Yue, Jingyu Yang.
"MARIS: Referring Image Segmentation via Mutual-Aware Attention Features." ArXiv (2023). [paper] [2023.11]
Stable-SAM: Qi Fan, Xin Tao, Lei Ke, Mingqiao Ye, Yuan Zhang, Pengfei Wan, Zhongyuan Wang, Yu-Wing Tai, Chi-Keung Tang.
"Stable Segment Anything Model." ArXiv (2023). [paper] [code] [2023.11]
PromptNucSeg: Zhongyi Shui, Yunlong Zhang, Kai Yao, Chenglu Zhu, Yuxuan Sun, Lin Yang.
"Unleashing the Power of Prompt-driven Nucleus Instance Segmentation." ArXiv (2023). [paper] [code] [2023.11]
Rutuja Gurav, Het Patel, Zhuocheng Shang, Ahmed Eldawy, Jia Chen, Elia Scudiero, Evangelos Papalexakis.
"Can SAM recognize crops? Quantifying the zero-shot performance of a semantic segmentation foundation model on generating crop-type maps using satellite imagery for precision agriculture." NeurIPS (2023). [paper] [code] [2023.11]
Francesco Croce, Matthias Hein.
"Segment (Almost) Nothing: Prompt-Agnostic Adversarial Attacks on Segmentation Models." ArXiv (2023). [paper] [2023.11]
P2RBox: Guangming Cao, Xuehui Yu, Wenwen Yu, Xumeng Han, Xue Yang, Guorong Li, Jianbin Jiao, Zhenjun Han.
"P2RBox: A Single Point is All You Need for Oriented Object Detection." ArXiv (2023). [paper] [2023.11]
PG-Video-LLaVA: Shehan Munasinghe, Rusiru Thushara, Muhammad Maaz, Hanoona Abdul Rasheed, Salman Khan, Mubarak Shah, Fahad Khan.
"PG-Video-LLaVA: Pixel Grounding Large Video-Language Models." ArXiv (2023). [paper] [code] [2023.11]
MetaDreamer: Lincong Feng, Muyu Wang, Maoyu Wang, Kuo Xu, Xiaoli Liu.
"MetaDreamer: Efficient Text-to-3D Creation With Disentangling Geometry and Texture." ArXiv (2023). [paper] [code] [2023.11]
Emu Edit: Shelly Sheynin, Adam Polyak, Uriel Singer, Yuval Kirstain, Amit Zohar, Oron Ashual, Devi Parikh, Yaniv Taigman.
"Emu Edit: Precise Image Editing via Recognition and Generation Tasks." ArXiv (2023). [paper] [homepage] [2023.11]
Duy Minh Ho Nguyen, Tan Ngoc Pham, Nghiem Tuong Diep, Nghi Quoc Phan, Quang Pham, Vinh Tong, Binh T. Nguyen, Ngan Hoang Le, Nhat Ho, Pengtao Xie, Daniel Sonntag, Mathias Niepert.
"On the Out of Distribution Robustness of Foundation Models in Medical Image Segmentation." ArXiv (2023). [paper] [2023.11]
Yu Ando, Nora Jee-Young Park and, Gun Oh Chong, Seokhwan Ko, Donghyeon Lee, Junghwan Cho, Hyungsoo Han.
"Interpretable pap smear cell representation for cervical cancer screening." ArXiv (2023). [paper] [2023.11]
GT-Maps: Yimeng Li, Navid Rajabi, Sulabh Shrestha, Md Alimoor Reza, Jana Kosecka.
"Labeling Indoor Scenes with Fusion of Out-of-the-Box Perception Models." ArXiv (2023). [paper] [2023.11]
Nam V. Nguyen, Hieu Trung Huynh, and Phuc-Lu Le.
"Deep Learning Techniques for Segmenting Breast Lesion Regions and Classifying Mammography Images." ArXiv (2023). [paper] [2023.11]
Ren Li, Corentin Dumery, Benoît Guillard, Pascal Fua.
"Garment Recovery with Shape and Deformation Priors." ArXiv (2023). [paper] [2023.11]
MROS: Kechen Song, Hongwei Wen, Xiaotong Xue, Liming Huang, Yingying Ji, Yunhui Yan .
"Modality Registration and Object Search Framework for UAV-based Unregistered RGB-T Image Salient Object Detection." ArXiv (2023). [paper] [code] [2023.11]
CPVLF: Lv Tang, Peng-Tao Jiang, Zhihao Shen, Hao Zhang, Jinwei Chen, Bo Li.
"Generalization and Hallucination of Large Vision-Language Models through a Camouflaged Lens." ArXiv (2023). [paper] [2023.11]
Zixuan Xie, Rengan Xie, Rong Li, Kai Huang, Pengju Qiao, Jingsen Zhu, Xu Yin, Qi Ye, Wei Hua, Yuchi Huo, Hujun Bao.
"Holistic Inverse Rendering of Complex Facade via Aerial 3D Scanning." ArXiv (2023). [paper] [2023.11]
Clarity ChatGPT: Yanyan Wei, Zhao Zhang, Jiahuan Ren, Xiaogang Xu, Richang Hong, Yi Yang, Shuicheng Yan, Meng Wang.
"Clarity ChatGPT: An Interactive and Adaptive Processing System for Image Restoration and Enhancement." ArXiv (2023). [paper] [2023.11]
GCDSS: Zhengyuan Peng, Qijian Tian, Jianqing Xu, Yizhang Jin, Xuequan Lu, Xin Tan, Yuan Xie, Lizhuang Ma.
"Generalized Category Discovery in Semantic Segmentation." ArXiv (2023). [paper] [code] [2023.11]
Few-shot SLVM: Xiyu Qi, Yifan Wu, Yongqiang Mao, Wenhui Zhang, Yidan Zhang.
"Self-guided Few-shot Semantic Segmentation for Remote Sensing Imagery Based on Large Vision Models." ArXiv (2023). [paper] [2023.11]
OCT-mosaicking: Jiacheng Wang, Hao Li, Dewei Hu, Yuankai K. Tao, Ipek Oguz.
"Novel OCT mosaicking pipeline with Feature- and Pixel-based registration." ArXiv (2023). [paper] [code] [2023.11]
PseCo: Huang Zhizhong, Dai Mingliang, Zhang Yi, Zhang Junping, Shan Hongming.
"Point, Segment and Count: A Generalized Framework for Object Counting." ArXiv (2023). [paper] [code] [2023.11]
FreeKD: Yuan Zhang, Tao Huang, Jiaming Liu, Tao Jiang, Kuan Cheng, Shanghang Zhang.
"FreeKD: Knowledge Distillation via Semantic Frequency Prompt." ArXiv (2023). [paper] [2023.11]
Rohit Bharadwaj, Muzammal Naseer, Salman Khan, Fahad Shahbaz Khan.
"Enhancing Novel Object Detection via Cooperative Foundational Models." ArXiv (2023). [paper] [code] [2023.11]
GMISeg: Jing Xu.
"GMISeg: General Medical Image Segmentation without Re-Training." ArXiv (2023). [paper] [2023.11]
Tian Meng, Yang Tao, Wuliang Yin.
"Few-Shot Classification & Segmentation Using Large Language Models Agent." ArXiv (2023). [paper] [2023.11]
MorSeg-CAM-SAM: Xin Yue, Qing Zhao, Jianqiang Li, Xiaoling Liu, Changwei Song, Suqin Liu, Guanghui Fu.
"Morphology-Enhanced CAM-Guided SAM for weakly supervised Breast Lesion Segmentation." ArXiv (2023). [paper] [code] [2023.11]
SA-Med2D-20M: Jin Ye, Junlong Cheng, Jianpin Chen, Zhongying Deng, Tianbin Li, Haoyu Wang, Yanzhou Su, Ziyan Huang, Jilong Chen, Lei Jiang, Hui Sun, Min Zhu, Shaoting Zhang, Junjun He, Yu Qiao.
"SA-Med2D-20M Dataset: Segment Anything in 2D Medical Imaging with 20 Million masks." ArXiv (2023). [paper] [code] [2023.11]
OmniSeg3D: Haiyang Ying, Yixuan Yin, Jinzhi Zhang, Fan Wang, Tao Yu, Ruqi Huang, Lu Fang.
"OmniSeg3D: Omniversal 3D Segmentation via Hierarchical Contrastive Learning." ArXiv (2023). [paper] [homepage] [2023.11]
GeoSAM: Rafi Ibn Sultan, Chengyin Li, Hui Zhu, Prashant Khanduri, Marco Brocanelli, Dongxiao Zhu.
"GeoSAM: Fine-tuning SAM with Sparse and Dense Visual Prompting for Automated Segmentation of Mobility Infrastructure." ArXiv (2023). [paper] [2023.11]
CellSAM : Uriah Israel, Markus Marks, Rohit Dilip, Qilin Li, Morgan Schwartz, Elora Pradhan, Edward Pao, Shenyi Li, Alexander Pearson-Goulart, Pietro Perona, Georgia Gkioxari, Ross Barnowski, Yisong Yue, David Van Valen.
"A Foundation Model for Cell Segmentation." ArXiv (2023). [paper] [code] [2023.11]
RockSAM: Zhaoyang Ma, Xupeng He, Shuyu Sun, Bicheng Yan, Hyung Kwak, Jun Gao.
"Zero-Shot Digital Rock Image Segmentation with a Fine-Tuned Segment Anything Model." ArXiv (2023). [paper] [2023.11]
DMV3D: Yinghao Xu, Hao Tan, Fujun Luan, Sai Bi, Peng Wang, Jiahao Li, Zifan Shi, Kalyan Sunkavalli, Gordon Wetzstein, Zexiang Xu, Kai Zhang.
"DMV3D: Denoising Multi-View Diffusion using 3D Large Reconstruction Model." ArXiv (2023). [paper] [code] [2023.11]
InterpAny-Clearer: Zhihang Zhong, Gurunandan Krishnan, Xiao Sun, Yu Qiao, Sizhuo Ma, Jian Wang.
"Clearer Frames, Anytime: Resolving Velocity Ambiguity in Video Frame Interpolation." ArXiv (2023). [paper] [homepage] [code] [2023.11]
OSM: Qihang Yu, Xiaohui Shen, Liang-Chieh Chen.
"Towards Open-Ended Visual Recognition with Large Language Model." ArXiv (2023). [paper] [code] [2023.11]
UR-SAM: Yichi Zhang, Shiyao Hu, Chen Jiang, Yuan Cheng, Yuan Qi.
"Segment Anything Model with Uncertainty Rectification for Auto-Prompting Medical Image Segmentation." ArXiv (2023). [paper] [2023.11]
DefectSAM: Bozhen Hu, Bin Gao, Cheng Tan, Tongle Wu, Stan Z. Li.
"Segment Anything in Defect Detection." ArXiv (2023). [paper] [2023.11]
UnifiedVisionGPT: Chris Kelly, Luhui Hu, Cindy Yang, Yu Tian, Deshun Yang, Bang Yang, Zaoshan Huang, Zihao Li, Yuexian Zou.
"UnifiedVisionGPT: Streamlining Vision-Oriented AI through Generalized Multimodal Framework." ArXiv (2023). [paper] [code] [2023.11]
Slide-SAM: Quan Quan, Fenghe Tang, Zikang Xu, Heqin Zhu, S. Kevin Zhou.
"Slide-SAM: Medical SAM Meets Sliding Window." ArXiv (2023). [paper] [2023.11]
MM-Navigator: An Yan, Zhengyuan Yang, Wanrong Zhu, Kevin Lin, Linjie Li, Jianfeng Wang, Jianwei Yang, Yiwu Zhong, Julian McAuley, Jianfeng Gao, Zicheng Liu, Lijuan Wang.
“GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone GUI Navigation." ArXiv (2023). [paper] [code] [2023.11]
TriDental: Tomáš Kunzo, Viktor Kocur, Lukáš Gajdošech, Martin Madaras.
"Processing and Segmentation of Human Teeth from 2D Images using Weakly Supervised Learning." DISA (2023). [paper] [2023.11]
Hyungeun Lee, Ung Hwang, Seungwon Yu, Chang-Hun Lee, Kijung Yoon.
"Processing and Segmentation of Human Teeth from 2D Images using Weakly Supervised Learning." ML4H (2023). [paper] [2023.11]
AdapterShadow: Leiping Jie, Hui Zhang.
"AdapterShadow: Adapting Segment Anything Model for Shadow Detection." ArXiv (2023). [paper] [code] [2023.11]
Uni-COAL: Zhiyun Song, Zengxin Qi, Xin Wang, Xiangyu Zhao, Zhenrong Shen, Sheng Wang, Manman Fei, Zhe Wang, Di Zang, Dongdong Chen, Linlin Yao, Qian Wang, Xuehai Wu, Lichi Zhang.
"Uni-COAL: A Unified Framework for Cross-Modality Synthesis and Super-Resolution of MR Images." ArXiv (2023). [paper] [2023.11]
SAMIHS: Yinuo Wang, Kai Chen, Weimin Yuan, Cai Meng, XiangZhi Bai.
"SAMIHS: Adaptation of Segment Anything Model for Intracranial Hemorrhage Segmentation." ArXiv (2023). [paper] [code] [2023.11]
Virmarie Maquiling, Sean Anthony Byrne, Diederick C. Niehorster, Marcus Nyström, Enkelejda Kasneci.
"Zero-Shot Segmentation of Eye Features Using the Segment Anything Model (SAM)." ArXiv (2023). [paper] [code] [2023.11]
GlanceSeg: Hongyang Jiang, Mengdi Gao, Zirong Liu, Chen Tang, Xiaoqing Zhang, Shuai Jiang, Wu Yuan, Jiang Liu.
"GlanceSeg: Real-time microaneurysm lesion segmentation with gaze-map-guided foundation model for early detection of diabetic retinopathy." ArXiv (2023). [paper] [2023.11]
EviPrompt: Yinsong Xu, Jiaqi Tang, Aidong Men, Qingchao Chen.
"EviPrompt: A Training-Free Evidential Prompt Generation Method for Segment Anything Model in Medical Images." ArXiv (2023). [paper] [2023.11]
FDNet: Xiang Feng, Chengkai Wang, Chengyu Wu, Yunxiang Li, Yongbo He, Shuai Wang, Yaiqi Wang.
"FDNet: Feature Decoupled Segmentation Network for Tooth CBCT Image." ArXiv (2023). [paper] [2023.11]
GISCup23: Xuanshu Luo, Paul Walther, Wejdene Mansour, Balthasar Teuscher, Johann Maximilian Zollner, Hao Li, Martin Werner.
"Exploring GeoAI Methods for Supraglacial Lake Mapping on Greenland Ice Sheet." ArXiv (2023). [paper] [code] [2023.11]
u-LLaVA: Jinjin Xu, Liwu Xu, Yuzhe Yang, Xiang Li, Yanchun Xie, Yi-Jie Huang, Yaqian Li.
"u-LLaVA: Unifying Multi-Modal Tasks via Large Language Model." ArXiv (2023). [paper] [2023.11]
LLaVA-Plus: Shilong Liu, Hao Cheng, Haotian Liu, Hao Zhang, Feng Li, Tianhe Ren, Xueyan Zou, Jianwei Yang, Hang Su, Jun Zhu, Lei Zhang, Jianfeng Gao, Chunyuan Li.
"LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents." ArXiv (2023). [paper] [code] [2023.11]
EVA-VOS: Thanos Delatolas, Vicky Kalogeiton, Dim P. Papadopoulos.
"Learning the What and How of Annotation in Video Object Segmentation." WACV (2023). [paper] [code] [2023.11]
NExT-Chat: Ao Zhang, Liming Zhao, Chen-Wei Xie, Yun Zheng, Wei Ji, Tat-Seng Chua.
"NExT-Chat: An LMM for Chat, Detection and Segmentation." ArXiv (2023). [paper] [code] [2023.11]
SAMVG: Haokun Zhu, Juang Ian Chong, Teng Hu, Ran Yi, Yu-Kun Lai, Paul L. Rosin.
"SAMVG: A Multi-stage Image Vectorization Model with the Segment-Anything Model." ArXiv (2023). [paper] [2023.11]
Danielle Ferreira, Rima Arnaout.
"Are foundation models efficient for medical image segmentation?" ArXiv (2023). [paper] [code] [2023.11]
VFMV: Kejun Wu, Qiong Liu, Kim-Hui Yap, and You Yang.
"High dimensional optical data — varifocal multiview imaging, compression and evaluation." Optics Express (2023). [paper] [2023.11]
T-NT: Zhenjun Yu, Wenqiang Xu, Siqiong Yao, Jieji Ren, Tutian Tang, Yutong Li, Guoying Gu, Cewu Lu.
"Precise Robotic Needle-Threading with Tactile Perception and Reinforcement Learning." ArXiv (2023). [paper] [code] [2023.11]
GLaMM: Hanoona Rasheed, Muhammad Maaz, Sahal Shaji, Abdelrahman Shaker, Salman Khan, Hisham Cholakkal, Rao M. Anwer, Erix Xing, Ming-Hsuan Yang, Fahad S. Khan.
"GLaMM: Pixel Grounding Large Multimodal Model." ArXiv (2023). [paper] [code] [2023.11]
Masking: Elias Arbash, Andréa de Lima Ribeiro, Sam Thiele, Nina Gnann, Behnood Rasti, Margret Fuchs, Pedram Ghamisi, Richard Gloaguen.
"Masking Hyperspectral Imaging Data with Pretrained Models." ArXiv (2023). [paper] [code] [2023.11]
Yiran Li, Junpeng Wang, Prince Aboagye, Michael Yeh, Yan Zheng, Liang Wang, Wei Zhang, Kwan-Liu Ma.
"Visual Analytics for Efficient Image Exploration and User-Guided Image Captioning." ArXiv (2023). [paper] [2023.11]
CSF: Shichao Dong, Fayao Liu, Guosheng Lin.
"Leveraging Large-Scale Pretrained Vision Foundation Models for Label-Efficient 3D Point Cloud Segmentation." ArXiv (2023). [paper] [2023.11]
RegionSpot: Haosen Yang, Chuofan Ma, Bin Wen, Yi Jiang, Zehuan Yuan, Xiatian Zhu.
"Recognize Any Regions." ArXiv (2023). [paper] [code] [2023.11]
MSMedCap: Gaoang Wang, Zhenyu Zhang, Benlu Wang, Weijie Liang, Yizhi Li, Xuechen Guo, Guanhong Wang, Shiyan Li.
"Sam-Guided Enhanced Fine-Grained Encoding with Mixed Semantic Learning for Medical Image Captioning." ArXiv (2023). [paper] [2023.11]
MVS: Mykhailo Shvets, Dongxu Zhao, Marc Niethammer, Roni Sengupta, Alexander C. Berg.
"Joint Depth Prediction and Semantic Segmentation with Multi-View SAM." WACV (2024). [paper] [2023.11]
EditAnything: Shanghua Gao, Zhijie Lin, Xingyu Xie, Pan Zhou, Ming-Ming Cheng, Shuicheng Yan.
"EditAnything: Empowering Unparalleled Flexibility in Image Editing and Generation." ACM MM (2023). [paper] [code] [2023.10]
ImEW: ImEW: A Framework for Editing Image in the Wild.
"Tasnim Mohiuddi, Tianyi Zhang, Maowen Nie, Jing Huang, Qianqian Chen, Wei Shi." LGM3A Workshop (2023). [paper] [2023.10]
Fen Fang, Yi Cheng, Ying Sun, Qianli Xu.
"Team I2R-VI-FF Technical Report on EPIC-KITCHENS VISOR Hand Object Segmentation Challenge 2023." ArXiv (2023). [paper] [2023.10]
InsDet : Qianqian Shen, Yunhan Zhao, Nahyun Kwon, Jeeeun Kim, Yanan Li, Shu Kong.
"A High-Resolution Dataset for Instance Detection with Multi-View Instance Capture." NeurIPS Datasets and Benchmarks Track (2023). [paper] [code] [2023.10]
Deepa Anand, Gurunath Reddy M, Vanika Singhal, Dattesh D. Shanbhag, Shriram KS, Uday Patil, Chitresh Bhushan, Kavitha Manickam, Dawei Gui, Rakesh Mullick, Avinash Gopal, Parminder Bhatia, Taha Kass-Hout.
"One-shot Localization and Segmentation of Medical Images with Foundation Models." NeurIPS Workshop (2023). [paper] [2023.10]
AVIS : Ruohao Guo, Yaru Chen, Yanyu Qi, Wenzhen Yue, Dantong Niu, Xianghua Ying.
"Audio-Visual Instance Segmentation." ArXiv (2023). [paper] [2023.10]
ProMISe: Hao Li, Han Liu, Dewei Hu, Jiacheng Wang, Ipek Oguz.
"Promise:Prompt-driven 3D Medical Image Segmentation Using Image Models." ArXiv (2023). [paper] [code] [2023.10]
Joana Palés Huix, Adithya Raju Ganeshan, Johan Fredin Haslum, Magnus Söderberg, Christos Matsoukas, Kevin Smith.
"Are Natural Domain Foundation Models Useful for Medical Image Classification?." ArXiv (2023). [paper] [2023.10]
OBM: Kai Li, Yupeng Deng, Yunlong Kong, Diyou Liu, Jingbo Chen, Yu Meng, Junxian Ma.
"Rebuild City Buildings from Off-Nadir Aerial Images with Offset-Building Model (OBM)." ArXiv (2023). [paper] [code] [2023.10]
TGVE: Jay Zhangjie Wu, Xiuyu Li, Difei Gao, Zhen Dong, Jinbin Bai, Aishani Singh, Xiaoyu Xiang, Youzeng Li, Zuwei Huang, Yuanxi Sun, Rui He, Feng Hu, Junhua Hu, Hai Huang, Hanyu Zhu, Xu Cheng, Jie Tang, Mike Zheng Shou, Kurt Keutzer, Forrest Iandola.
"CVPR 2023 Text Guided Video Editing Competition." ArXiv (2023). [paper] [code] [2023.10]
ViewControl: Jinbin Bai, Zhen Dong, Aosong Feng, Xiao Zhang, Tian Ye, Kaicheng Zhou, Mike Zheng Shou.
"Integrating View Conditions for Image Synthesis." ArXiv (2023). [paper] [2023.10]
SparseDFF: Qianxu Wang, Haotong Zhang, Congyue Deng, Yang You, Hao Dong, Yixin Zhu, Leonidas Guibas.
"SparseDFF: Sparse-View Feature Distillation for One-Shot Dexterous Manipulation." ArXiv (2023). [paper] [2023.10]
SAMPOT: Rachana Sathish, Rahul Venkataramani, K S Shriram, Prasad Sudhakar.
"Task-driven Prompt Evolution for Foundation Models." ArXiv (2023). [paper] [2023.10]
SonoSAM: Hariharan Ravishankar, Rohan Patil, Vikram Melapudi, Parminder Bhatia, Kass-Hout Taha, Pavan Annangi.
"SonoSAM -- Segment Anything on Ultrasound Images." ASMUS (2023). [paper] [2023.10]
Bertrand Chauveau, Pierre Merville.
"Segment Anything by Meta as a foundation model for image segmentation: a new era for histopathological images." Pathology (2023). [paper] [2023.10]
MAFT: Siyu Jiao, Yunchao Wei, Yaowei Wang, Yao Zhao, Humphrey Shi.
"Learning Mask-aware CLIP Representations for Zero-Shot Segmentation." NeurIPS (2023). [paper] [code] [2023.10]
Ardiansyah Koeshidayatullah.
"Riding the Wave: One-Touch Automatic Salt Segmentation by Coupling SAM and SegGPT ." ArXiv (2023). [paper] [2023.10]
LuGSAM: Dhanush Babu Ramesh, Rishika Iytha Sridhar, Pulakesh Upadhyaya and Rishikesan Kamaleswaran.
"Lung Grounded-SAM (LuGSAM): A Novel Framework for Integrating Text prompts to Segment Anything Model (SAM) for Segmentation Tasks of ICU Chest X-Rays." ArXiv (2023). [paper] [2023.10]
Zero123++: Ruoxi Shi, Hansheng Chen, Zhuoyang Zhang, Minghua Liu, Chao Xu, Xinyue Wei, Linghao Chen, Chong Zeng, Hao Su.
"Zero123++: a Single Image to Consistent Multi-view Diffusion Base Model." ArXiv (2023). [paper] [code] [2023.10]
ConceptFusion: Krishna Murthy Jatavallabhula, Alihusein Kuwajerwala, Qiao Gu, Mohd Omama, Tao Chen, Alaa Maalouf, Shuang Li, Ganesh Iyer, Soroush Saryazdi, Nikhil Keetha, Ayush Tewari, Joshua B. Tenenbaum, Celso Miguel de Melo, Madhava Krishna, Liam Paull, Florian Shkurti, Antonio Torralba.
"ConceptFusion: Open-set Multimodal 3D Mapping." RSS (2023). [paper] [code] [2023.10]
CryoSegNet: Rajan Gyawali, Ashwin Dhakal, Liguo Wang, Jianlin Cheng.
"Accurate cryo-EM protein particle picking by integrating the foundational AI image segmentation model and specialized U-Net." ArXiv (2023). [paper] [2023.10]
CISRU: Silvia Romero-Azpitartea, Cristina Lunaa, Alba Guerraa, Mercedes Alonsoa, Pablo Romeo Manriquea, Marina L. Seoanea, Daniel Olayoa, Almudena Morenoa, Pablo Castellanosa, Fernando Gandíaa, Gianfranco Visentinb.
"Enabling In-Situ Resources Utilisation by leveraging collaborative robotics and astronaut-robot interaction." IAC (2023). [paper] [2023.10]
DiffPrompter: Sanket Kalwar, Mihir Ungarala, Shruti Jain, Aaron Monis, Krishna Reddy Konda, Sourav Garg, K Madhava Krishna.
"DiffPrompter: Differentiable Implicit Visual Prompts for Semantic-Segmentation in Adverse Conditions." ArXiv (2023). [paper] [code] [2023.10]
Alessandro Saviolo, Pratyaksh Rao, Vivek Radhakrishnan, Jiuhong Xiao, Giuseppe Loianno.
"Unifying Foundation Models with Quadrotor Control for Visual Tracking Beyond Object Categories." ArXiv (2023). [paper] [2023.10]
Ruoqing Zhao, Xi Wang, Hongliang Dai, Pan Gao, Piji Li.
"Medical Report Generation Based on Segment-Enhanced Contrastive Representation Learning." NLPCC (2023). [paper] [2023.10]
Robin Karlsson, Francisco Lepe-Salazar, Kazuya Takeda.
"Compositional Semantics for Open Vocabulary Spatio-semantic Representations." ArXiv (2023). [paper] [2023.10]
InstructDET : Ronghao Dang, Jiangyan Feng, Haodong Zhang, Chongjian Ge, Lin Song, Lijun Gong, Chengju Liu, Qijun Chen, Feng Zhu, Rui Zhao, Yibing Song.
"InstructDET: Diversifying Referring Object Detection with Generalized Instructions." ArXiv (2023). [paper] [code] [2023.10]
HICOME: Peng Zheng.
"Discriminative Consensus Mining with A Thousand Group for More Accurate Co-Salient Object Detection." ArXiv (2023). [paper] [code] [2023.10]
Ferret: Haoxuan You, Haotian Zhang, Zhe Gan, Xianzhi Du, Bowen Zhang, Zirui Wang, Liangliang Cao, Shih-Fu Chang, Yinfei Yang.
"." ArXiv (2023). [paper] [code] [2023.10]
OVTracktor: Wen-Hsuan Chu, Adam W. Harley, Pavel Tokmakov, Achal Dave, Leonidas Guibas, Katerina Fragkiadaki.
"Zero-Shot Open-Vocabulary Tracking with Large Pre-Trained Models." ArXiv (2023). [paper] [code] [2023.10]
OpenAnnotate3D: Yijie Zhou, Likun Cai, Xianhui Cheng, Zhongxue Gan, Xiangyang Xue, Wenchao Ding.
"OpenAnnotate3D: Open-Vocabulary Auto-Labeling System for Multi-modal 3D Data." ArXiv (2023). [paper] [code] [2023.10]
SSC: Francisco Eiras, Kemal Oksuz, Adel Bibi, Philip H.S. Torr, Puneet K. Dokania.
"Segment, Select, Correct: A Framework for Weakly-Supervised Referring Segmentation." ArXiv (2023). [paper] [code] [2023.10]
Shichang Liu, Junxin Chen, Ben-Guo He, Tao Chen, Gwanggil Jeon, Wei Wang.
"Adapting Segment Anything Model for Shield Tunnel Water Leakage Segmentation." AMC-SME Workshop (2023). [paper] [2023.10]
Sofia H. Gelado, César Quilodrán-Casas, Loïc Chagot.
"Enhancing Microdroplet Image Analysis with Deep Learning." Micromachines (2023). [paper] [2023.10]
EdgeCalib: Xingchen Li, Yifan Duan, Beibei Wang, Haojie Ren, Guoliang You, Yu Sheng, Jianmin Ji, Yanyong Zhang.
"EdgeCalib: Multi-Frame Weighted Edge Features for Automatic Targetless LiDAR-Camera Calibration." ArXiv (2023). [paper] [2023.10]
Open-NeRF: Hao Zhang, Fang Li, Narendra Ahuja.
"Open-NeRF: Towards Open Vocabulary NeRF Decomposition." WACV (2024). [paper] [2023.10]
SAM-CLIP: Haoxiang Wang, Pavan Kumar Anasosalu Vasu, Fartash Faghri, Raviteja Vemulapalli, Mehrdad Farajtabar, Sachin Mehta, Mohammad Rastegari, Oncel Tuzel, Hadi Pouransari.
"SAM-CLIP: Merging Vision Foundation Models towards Semantic and Spatial Understanding." ArXiv (2023). [paper] [2023.10]
SAM-Med3D: Haoyu Wang, Sizheng Guo, Jin Ye, Zhongying Deng, Junlong Cheng, Tianbin Li, Jianpin Chen, Yanzhou Su, Ziyan Huang, Yiqing Shen, Bin Fu, Shaoting Zhang, Junjun He, Yu Qiao.
"SAM-Med3D." ArXiv (2023). [paper] [code] [2023.10]
SAMCLR: Benjamin Missaoui, Chongbin Yuan.
"SAMCLR: Contrastive pre-training on complex scenes using SAM for view sampling." ArXiv (2023). [paper] [2023.10]
Zhaozheng Chen, Qianru Sun.
"Weakly-Supervised Semantic Segmentation with Image-Level Labels: from Traditional Models to Foundation Models." ArXiv (2023). [paper] [2023.10]
Sumit Pandey, Kuan-Fu Chen, Erik B. Dam.
"Comprehensive Multimodal Segmentation in Medical Imaging: Combining YOLOv8 with SAM and HQ-SAM Models." ArXiv (2023). [paper] [2023.10]
Mammo-SAM: Xinyu Xiong, Churan Wang, Wenxue Li, Guanbin Li.
"Mammo-SAM: Adapting Foundation Segment Anything Model for Automatic Breast Mass Segmentation in Whole Mammograms." ResearchGate (2023). [paper] [2023.10]
Dongshen Han, Sheng Zheng, Chaoning Zhang.
"Segment Anything Meets Universal Adversarial Perturbation." ArXiv (2023). [paper] [2023.10]
SoM-GPT4V: Jianwei Yang, Hao Zhang, Feng Li, Xueyan Zou, Chunyuan Li, Jianfeng Gao.
"Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V." ArXiv (2023). [paper] [homepage] [code] [2023.10]
IPSeg: Lv Tang, Peng-Tao Jiang, Hao-Ke Xiao, Bo Li.
"Towards Training-free Open-world Segmentation via Image Prompting Foundation Models." ArXiv (2023). [paper] [2023.10]
SAM_Interactive_Histopathology: SeungKyu Kim, Hyun-Jic Oh, Seonghui Min, Won-Ki Jeong.
"Evaluation and improvement of Segment Anything Model for interactive histopathology image segmentation." MICCAI Workshop (2023). [paper] [code] [2023.10]
Yao Qianxiang, Bin Jiang.
"Recursive Segmentation Living Image: An eXplainable AI (XAI) Approach for Computing Structural Beauty of Images or the Livingness of Space." ArXiv (2023). [paper] [2023.10]
Sheng Zheng, Chaoning Zhang.
"Black-box Targeted Adversarial Attack on Segment Anything (SAM)." ArXiv (2023). [paper] [2023.10]
Jiahao Xia, Gavin Gong 2, Jiawei Liu, Zhigang Zhu, Hao Tang.
"Segment Anything Model for Pedestrian Infrastructure Inventory: Assessing Zero-Shot Segmentation on Multi-Mode Geospatial Data." ArXiv (2023). [paper] [2023.10]
PUCD : Youngtack Oh, Minseok Seo, Doyi Ki, Junghoon Seo.
"Prototype-oriented Unsupervised Change Detection for Disaster Management." ArXiv (2023). [paper] [2023.10]
SAM-guided UDA: Xidong Peng, Runnan Chen, Feng Qiao, Lingdong Kong, Youquan Liu, Tai Wang, Xinge Zhu, Yuexin Ma.
"SAM-guided Unsupervised Domain Adaptation for 3D Segmentation." ArXiv (2023). [paper] [2023.10]
SemCom: Avi Deb Raha, Md. Shirajum Munir, Apurba Adhikary, Yu Qiao, Choong Seon Hong.
"Generative AI-driven Semantic Communication Framework for NextG Wireless Network." ArXiv (2023). [paper] [2023.10]
Christian A. Schiller.
"Virtual Augmented Reality for Atari Reinforcement Learning." ArXiv (2023). [paper] [2023.10]
MCREA: Xu Chen, Yunde Jia, Yuwei Wu.
"Fine-Grained Annotation for Face Anti-Spoofing." ArXiv (2023). [paper] [2023.10]
SAM-OCTA: Xinrun Chen, Chengliang Wang, Haojian Ning, Shiying Li.
"SAM-OCTA: Prompting Segment-Anything for OCTA Image Segmentation." ArXiv (2023). [paper] [code] [2023.10]
MED: Haijie Ren, Weiqiang Wang, Wentao Tang, Rui Zhang.
"Machine Eye for Defects: Machine Learning-Based Solution to Identify and Characterize Topological Defects in Textured Images of Nematic Materials." ArXiv (2023). [paper] [2023.10]
Mohammad Peivandi, Jason Zhang, Michael Lu, Dongxiao Zhu, Zhifeng Kou.
"Empirical Evaluation of the Segment Anything Model (SAM) for Brain Tumor Segmentation." ArXiv (2023). [paper] [2023.10]
Tree-GPT: Siqi Du, Shengjun Tang, Weixi Wang, Xiaoming Li, Renzhong Guo.
"Tree-GPT: Modular Large Language Model Expert System for Forest Remote Sensing Image Understanding and Interactive Analysis." ArXiv (2023). [paper] [2023.10]
TiC: Song Zhang, Qingzhong Wang, Jiang Bian, Haoyi Xiong.
"TiC: Exploring Vision Transformer in Convolution." ArXiv (2023). [paper] [code] [2023.10]
SLP: David Balaban, Justin Medich, Pranay Gosar, Justin Hart.
"Propagating Semantic Labels in Video Data." ArXiv (2023). [paper] [homepage] [2023.10]
Amin Ranem, Niklas Babendererde, Moritz Fuchs, Anirban Mukhopadhyay.
"Exploring SAM Ablations for Enhancing Medical Segmentation in Radiology and Pathology." ArXiv (2023). [paper] [2023.10]
Xiangru Li, Yifei Zhang, Liang Zhao.
"Multi-Prompt Fine-Tuning of Foundation Models for Enhanced Medical Image Segmentation." ArXiv (2023). [paper] [2023.10]
Ali Mayladan, Hasan Nasrallah, Hasan Moughnieh, Mustafa Shukor, Ali J. Ghandour.
"Zero-Shot Refinement of Buildings' Segmentation Models using SAM." ArXiv (2023). [paper] [2023.10]
GroupPrompter: Yichuang Luo, Fang Wang, Jing Xing, Xiaohu Liu.
"GroupPrompter: A Prompting Method for Semantic Segmentation Based on SAM." IEEE Access (2023). [paper] [2023.09]
GAVS: Yaoting Wang, Weisong Liu, Guangyao Li, Jian Ding, Di Hu, Xi Li.
"Prompting Segmentation with Sound is Generalizable Audio-Visual Source LocalizerPrompting Segmentation with Sound is Generalizable Audio-Visual Source Localizer." ArXiv (2023). [paper] [2023.09]
Raha, Avi Deb and Adhikary, Apurba and Munir, Md. Shirajum and Qiao, Yu and Hong, Choong Seon.
"Segment Anything Model Aided Beam Prediction for the Millimeter Wave Communication." APNOMS (2023). [paper] [2023.09]
PVLFF: Haoran Chen, Kenneth Blomqvist, Francesco Milano, Roland Siegwart.
"Panoptic Vision-Language Feature Fields." ArXiv (2023). [paper] [code] [2023.09]
SAMStyler: Psychogyios, Konstantinos and Leligou, Helen C. and Melissari, Filisia and Bourou, Stavroula and Anastasakis, Zacharias and Zahariadis, Theodore.
"SAMStyler: Enhancing Visual Creativity With Neural Style Transfer and Segment Anything Model (SAM)." IEEE Access (2023). [paper] [2023.09]
Aneesh Rangnekar, Jue Jiang, Harini Veeraraghavan.
"3D Swin Transformer for Partial Medical Auto Segmentation." MICCAI-FLARE (2023). [paper] [2023.09]
ASA: Yaqin Li, Dandan Wang, Cao Yuan, Hao Li, Jing Hu.
"Enhancing Agricultural Image Segmentation with an Agricultural Segment Anything Model Adapter." Sensors (2023). [paper] [2023.09]
SCROD: Valentyn Boreiko, Matthias Hein, Jan Hendrik Metzen.
"Identifying Systematic Errors in Object Detectors with the SCROD Pipeline." ICCV Workshop (2023). [paper] [2023.09]
Iraklis Giannakis, Anshuman Bhardwaj, Lydia Sam, Georgios Leontidis.
"A flexible deep learning crater detection scheme using Segment Anything Model (SAM)." ICARUS (2023). [paper] [2023.09]
SuPerPM: Shan Lin, Albert J. Miao, Ali Alabiad, Fei Liu, Kaiyuan Wang, Jingpei Lu, Florian Richter, Michael C. Yip.
"SuPerPM: A Large Deformation-Robust Surgical Perception Framework Based on Deep Point Matching Learned from Physical Constrained Simulation Data" ArXiv (2023). [paper] [2023.09]
Bi-SAM: Ying Zhao, Kechen Song, Wenqi Cui, Hang Ren, Yunhui Yan.
"MFS enhanced SAM: Achieving superior performance in bimodal few-shot segmentation." JVCIR (2023). [paper] [code] [2023.09]
BaDLAD: Kazi Reyazul Hasan, Mubasshira Musarrat, Sadif Ahmed, Shahriar Raj.
"Framework and Model Analysis on Bengali Document Layout Analysis Dataset: BaDLAD." ArXiv (2023). [paper] [2023.09]
SAM-Adapter: Tianrun Chen, Lanyun Zhu, Chaotao Deng, Runlong Cao, Yan Wang, Shangzhan Zhang, Zejian Li, Lingyun Sun, Ying Zang, Papa Mao.
"SAM-Adapter: Adapting Segment Anything in Underperformed Scenes." ICCV Workshop (2023). [paper] [code] [2023.09]
UniQuadric: Linghao Yang, Yanmin Wu, Yu Deng, Rui Tian, Xinggang Hu, Tiefeng Ma.
"UniQuadric: A SLAM Backend for Unknown Rigid Object 3D Tracking and Light-Weight Modeling." ArXiv (2023). [paper] [2023.09]
SAMFeat: Jingqian Wu, Rongtao Xu, Zach Wood-Doughty, Changwei Wang.
"Segment Anything Model is a Good Teacher for Local Feature Learning." ArXiv (2023). [paper] [code] [2023.09]
nnSAM: Yunxiang Li, Bowen Jing, Xiang Feng, Zihan Li, Yongbo He, Jing Wang, You Zhang.
"nnSAM: Plug-and-play Segment Anything Model Improves nnUNet Performance." ArXiv (2023). [paper] [code] [2023.09]
Mayara E. Bonani, Max Schwarz, Sven Behnke.
"Learning from SAM: Harnessing a Segmentation Foundation Model for Sim2Real Domain Adaptation through Regularization." ArXiv (2023). [paper] [2023.09]
Khoa Dang Nguyen, Thanh-Hai Phung, Hoang-Giang Cao.
"A SAM-based Solution for Hierarchical Panoptic Segmentation of Crops and Weeds Competition." ICCV Workshop (2023). [paper] [2023.09]
MediViSTA-SAM: Sekeun Kim, Kyungsang Kim, Jiang Hu, Cheng Chen, Zhiliang Lyu, Ren Hui, Sunghwan Kim, Zhengliang Liu, Aoxiao Zhong, Xiang Li, Tianming Liu, Quanzheng Li.
"MediViSTA-SAM: Zero-shot Medical Video Analysis with Spatio-temporal SAM Adaptation." ArXiv (2023). [paper] [code] [2023.09]
PointSSC: Yuxiang Yan, Boda Liu, Jianfei Ai, Qinbu Li, Ru Wan, Jian Pu.
"PointSSC: A Cooperative Vehicle-Infrastructure Point Cloud Benchmark for Semantic Scene Completion." ICRA (2024). [paper] [2023.09]
NOC: Xiaobao Wei, Renrui Zhang, Jiarui Wu, Jiaming Liu, Ming Lu, Yandong Guo, Shanghang Zhang.
"NOC: High-Quality Neural Object Cloning with 3D Lifting of Segment Anything." ArXiv (2023). [paper] [2023.09]
SAM-OCTA: Chengliang Wang, Xinrun Chen, Haojian Ning, Shiying Li.
"SAM-OCTA: A Fine-Tuning Strategy for Applying Foundation Model to OCTA Image Segmentation Tasks." ArXiv (2023). [paper] [code] [2023.09]
MoPA: Haozhi Cao, Yuecong Xu, Jianfei Yang, Pengyu Yin, Shenghai Yuan, Lihua Xie.
"MoPA: Multi-Modal Prior Aided Domain Adaptation for 3D Semantic Segmentation." ArXiv (2023). [paper] [code] [2023.09]
Deshadow-Anything: Xiao Feng Zhang, Tian Yi Song, Jia Wei Yao.
"Deshadow-Anything: When Segment Anything Model Meets Zero-shot shadow removal." ArXiv (2023). [paper] [2023.09]
3D-U-SAM: Yifu Zhang, Zuozhu Liu, Yang Feng, Renjing Xu.
"3D-U-SAM Network For Few-shot Tooth Segmentation in CBCT Images." ArXiv (2023). [paper] [2023.09]
OCTA-FRNet: Haojian Ning, Chengliang Wang, Xinrun Chen, Shiying Li.
"An Accurate and Efficient Neural Network for OCTA Vessel Segmentation and a New Dataset." ArXiv (2023). [paper] [code] [2023.09]
MA-SAM: Cheng Chen, Juzheng Miao, Dufan Wu, Zhiling Yan, Sekeun Kim, Jiang Hu, Aoxiao Zhong, Zhengliang Liu, Lichao Sun, Xiang Li, Tianming Liu, Pheng-Ann Heng, Quanzheng Li.
"MA-SAM: Modality-agnostic SAM Adaptation for 3D Medical Image Segmentation." ArXiv (2023). [paper] [code] [2023.09]
samgeo: Qiusheng Wu and Lucas Prado Osco.
"samgeo: A Python package for segmenting geospatial data with the Segment Anything Model (SAM)." JOSS (2023). [paper] [code] [2023.09]
Peng Zhang, Yaping Wang.
"Segment Anything Model for Brain Tumor Segmentation." ArXiv (2023). [paper] [2023.09]
SAMUS: Xian Lin, Yangyang Xiang, Li Zhang, Xin Yang, Zengqiang Yan, Li Yu.
"SAMUS: Adapting Segment Anything Model for Clinically-Friendly and Generalizable Ultrasound Image Segmentation." ArXiv (2023). [paper] [code] [2023.09]
CMSF: Swapnil Bhosale, Haosen Yang, Diptesh Kanojia, Xiatian Zhu.
"Leveraging Foundation models for Unsupervised Audio-Visual Segmentation." ArXiv (2023). [paper] [2023.09]
Xiaodan Xing, Chunling Tang, Yunzhe Guo, Nicholas Kurniawan, Guang Yang.
"SegmentAnything helps microscopy images based automatic and quantitative organoid detection and analysis." ArXiv (2023). [paper] [2023.09]
Chenbin Liu, Zhengliang Liu, Jason Holmes, Lu Zhang, Lian Zhang, Yuzhen Ding, Peng Shu, Zihao Wu, Haixing Dai, Yiwei Li, Dinggang Shen, Ninghao Liu, Quanzheng Li, Xiang Li, Dajiang Zhu, Tianming Liu, Wei Liu.
"Artificial General Intelligence for Radiation Oncology." ArXiv (2023). [paper] [2023.09]
DEVA: Ho Kei Cheng, Seoung Wug Oh, Brian Price, Alexander Schwing, Joon-Young Lee.
"Tracking Anything with Decoupled Video Segmentation." ICCV (2023). [paper] [project page] [code] [2023.09]
SAM3D: Nhat-Tan Bui, Dinh-Hieu Hoang, Minh-Triet Tran, Ngan Le.
"SAM3D: Segment Anything Model in Volumetric Medical Images." ArXiv (2023). [paper] [2023.09]
CropFormer: Lu Qi, Jason Kuen, Weidong Guo, Tiancheng Shen, Jiuxiang Gu, Jiaya Jia, Zhe Lin, Ming-Hsuan Yang.
"High-Quality Entity Segmentation." ICCV (2023). [paper] [project page] [code] [中文解读] [2023.09]
CIP-WPIS: Qingtao Yu, Heming Du, Chen Liu, Xin Yu.
"When 3D Bounding-Box Meets SAM: Point Cloud Instance Segmentation with Weak-and-Noisy Supervision." ArXiv (2023). [paper] [2023.09]
SAM-Deblur: Siwei Li, Mingxuan Liu, Yating Zhang, Shu Chen, Haoxiang Li, Hong Chen, Zifei Dou.
"SAM-Deblur: Let Segment Anything Boost Image Deblurring." ArXiv (2023). [paper] [code] [2023.09]
Hassan El-Hajj, Matteo Valleriani.
"Prompt me a Dataset: An investigation of text-image prompting for historical image dataset creation using foundation models." ICIAP (2023), AI4DH workshop. [paper] [2023.09]
SAM-CD: Lei Ding, Kun Zhu, Daifeng Peng, Hao Tang, Haitao Guo.
"Adapting Segment Anything Model for Change Detection in HR Remote Sensing Images." ArXiv (2023). [paper] [2023.09]
SAM-LIV: Junyao Shi, Jianing Qian, Yecheng Jason Ma, Dinesh Jayaraman.
"Plug-And-Play Object-Centric Representations From “What” and “Where” Foundation Models." ArXiv (2023). [paper] [2023.08]
UGainS: Alexey Nekrasov, Alexander Hermans, Lars Kuhnert, Bastian Leibe.
"UGainS: Uncertainty Guided Anomaly Instance Segmentation." GCPR (2023). [paper] [code] [2023.08]
Chaoqin Huang, Aofan Jiang, Ya Zhang, Yanfeng Wang.
"Multi-Scale Memory Comparison for Zero-/Few-Shot Anomaly Detection." ArXiv (2023). [paper] [2023.08]
Dwith Chenna, Suyash Bhogawar.
"Segment Anything Model (SAM) For Brain Extraction in fMRI Studies." IJAIMED (2023). [paper] [2023.08]
OSTRA : Jiexiong Xu, Weikun Zhao, Zhiyan Tang, Xiangchao Gan.
"A One Stop 3D Target Reconstruction and multilevel Segmentation Method." ArXiv (2023). [paper] [code] [2023.08]
ROSGPT_Vision: Bilel Benjdira, Anis Koubaa, Anas M. Ali.
"ROSGPT_Vision: Commanding Robots Using Only Language Models' Prompts." ArXiv (2023). [paper] [code] [2023.08]
CoDeF: Hao Ouyang, Qiuyu Wang, Yuxi Xiao, Qingyan Bai, Juntao Zhang, Kecheng Zheng, Xiaowei Zhou, Qifeng Chen, Yujun Shen.
"CoDeF: Content Deformation Fields for Temporally Consistent Video Processing." ArXiv (2023). [paper] [code] [2023.08]
WALL-E: Tianyu Wang, Yifan Li, Haitao Lin, Xiangyang Xue, Yanwei Fu.
"WALL-E: Embodied Robotic WAiter Load Lifting with Large Language Model." ArXiv (2023). [paper] [code] [2023.08]
Ref-Diff: Minheng Ni, Yabo Zhang, Kailai Feng, Xiaoming Li, Yiwen Guo, Wangmeng Zuo.
"Ref-Diff: Zero-shot Referring Image Segmentation with Generative Models." ArXiv (2023). [paper] [code] [2023.08]
Anwai Archit, Sushmita Nair, Nabeel Khalid, Paul Hilt, Vikas Rajashekar, Marei Freitag, Sagnik Gupta, Andreas Dengel, Sheraz Ahmed, Constantin Pape.
"Segment Anything for Microscopy." ResearchGate (2023). [paper] [2023.08]
Su Myat Noe.
"Efficient Segment-Anything Model for Automatic Mask Region Extraction in Livestock Monitoring." IEEE ICCT(2023). [paper] [2023.08]
SSM-SAM: Yiming Zhang, Tianang Leng, Kun Han, Xiaohui Xie.
"Self-Sampling Meta SAM: Enhancing Few-shot Medical Image Segmentation with Meta-Learning." ArXiv (2023). [paper] [2023.08]
SAM-Med2D: Junlong Cheng, Jin Ye, Zhongying Deng, Jianpin Chen, Tianbin Li, Haoyu Wang, Yanzhou Su, Ziyan Huang, Jilong Chen, Lei Jiang, Hui Sun, Junjun He, Shaoting Zhang, Min Zhu, Yu Qiao.
"SAM-Med2D." ArXiv (2023). [paper] [code] [2023.08]
AutoSAM Adapter: Chengyin Li, Prashant Khanduri, Yao Qiang, Rafi Ibn Sultan, Indrin Chetty, Dongxiao Zhu.
"Auto-Prompting SAM for Mobile Friendly 3D Medical Image Segmentation." ArXiv (2023). [paper] [2023.08]
Leo Fillioux, Emilie Gontran, Jérôme Cartry, Jacques RR Mathieu, Sabrina Bedja, Alice Boilève, Paul-Henry Cournède, Fanny Jaulin, Stergios Christodoulidis, Maria Vakalopoulou.
"Spatio-Temporal Analysis of Patient-Derived Organoid Videos Using Deep Learning for the Prediction of Drug Efficacy." ICCV Workshop (2023). [paper] [2023.08]
SAM-PARSER: Zelin Peng, Zhengqin Xu, Zhilin Zeng, Xiaokang Yang, Wei Shen.
"SAM-PARSER: Fine-tuning SAM Efficiently by Parameter Space Reconstruction." ArXiv (2023). [paper] [2023.08]
Weijia Feng, Lingting Zhu, Lequan Yu.
"Cheap Lunch for Medical Image Segmentation by Fine-tuning SAM on Few Exemplars." MICCAI BrainLes Workshop (2023). [paper] [2023.08]
Zihan Dong, ZhengDong Zhang.
"Enhancing Bloodstain Analysis Through AI-Based Segmentation: Leveraging Segment Anything Model for Crime Scene Investigation." KDD Workshop (2023). [paper] [code] [2023.08]
SCESAME: Hiroaki Yamagiwa, Yusuke Takase, Hiroyuki Kambe, Ryosuke Nakamoto.
"Zero-Shot Edge Detection With SCESAME: Spectral Clustering-Based Ensemble for Segment Anything Model Estimation." WACV Workshop (2024). [arXiv] [paper] [code] [2023.08]
SamDSK: Yizhe Zhang, Tao Zhou, Shuo Wang, Ye Wu, Pengfei Gu, Danny Z. Chen.
"SamDSK: Combining Segment Anything Model with Domain-Specific Knowledge for Semi-Supervised Learning in Medical Image Segmentation." ArXiv (2023). [paper] [code] [2023.08]
RSISeg: Zhe Wang, Shoukun Sun, Xiang Que, Xiaogang Ma.
"Interactive segmentation in aerial images: a new benchmark and an open access web-based tool." ArXiv (2023). [paper] [2023.08]
DiffSeg: Junjiao Tian, Lavisha Aggarwal, Andrea Colaco, Zsolt Kira, Mar Gonzalez-Franco.
"Diffuse, Attend, and Segment: Unsupervised Zero-Shot Segmentation using Stable Diffusion." ArXiv (2023). [paper] [2023.08]
SPPNet: Qing Xu, Wenwei Kuang, Zeyu Zhang, Xueyao Bao, Haoran Chen, Wenting Duan.
"SPPNet: A Single-Point Prompt Network for Nuclei Image Segmentation." ArXiv (2023). [paper] [code] [2023.08]
SAMSNeRF: Ange Lou, Yamin Li, Xing Yao, Yike Zhang, Jack Noble.
"SAMSNeRF: Segment Anything Model (SAM) Guides Dynamic Surgical Scene Reconstruction by Neural Radiance Field (NeRF)." ArXiv (2023). [paper] [code]
SS2V: Xing Yao, Han Liu, Dewei Hu, Daiwei Lu, Ange Lou, Hao Li, Ruining Deng, Gabriel Arenas, Baris Oguz, Nadav Schwartz, Brett C Byram, Ipek Oguz.
"False Negative/Positive Control for SAM on Noisy Medical Images." ArXiv (2023). [paper] [code] [2023.08]
SurgicalSAM: Wenxi Yue, Jing Zhang, Kun Hu, Yong Xia, Jiebo Luo, Zhiyong Wang.
"SurgicalSAM: Efficient Class Promptable Surgical Instrument Segmentation." ArXiv (2023). [paper] [code] [2023.08]
SAMedOCT: Botond Fazekas, José Morano, Dmitrii Lachinov, Guilherme Aresta, Hrvoje Bogunović.
"SAMedOCT: Adapting Segment Anything Model (SAM) for Retinal OCT." ArXiv (2023). [paper] [2023.08]
U-SAM: Hantao Zhang, Weidong Guo, Chenyang Qiu, Shouhong Wan, Bingbing Zou, Wanqin Wang, Peiquan Jin.
"CARE: A Large Scale CT Image Dataset and Clinical Applicable Benchmark Model for Rectal Cancer Segmentation." ArXiv (2023). [paper] [2023.08]
Few-Shot-Self-Prompt-SAM: Qi Wu, Yuyao Zhang, Marawan Elbatel.
"Self-Prompting Large Vision Models for Few-Shot Medical Image Segmentation." MICCAI DART Workshop (2023). [paper] [code] [2023.08]
Dancing Avatar: Bosheng Qin, Wentao Ye, Qifan Yu, Siliang Tang, Yueting Zhuang.
"Dancing Avatar: Pose and Text-Guided Human Motion Videos Synthesis with Image Diffusion Model." ArXiv (2023). [paper] [2023.08]
SurgicalSAM: An Wang, Mobarakol Islam, Mengya Xu, Yang Zhang, Hongliang Ren.
"SAM Meets Robotic Surgery: An Empirical Study on Generalization, Robustness and Adaptation." MICCAI MedAGI Workshop (2023). [paper] [2023.08]
OSTRA: Jiexiong Xu, Weikun Zhao, Zhiyan Tang, Xiangchao Gan.
"A One Stop 3D Target Reconstruction and multilevel Segmentation Method." ArXiv (2023). [paper] [code] [2023.08]
CEmb-SAM: Dongik Shin, Beomsuk Kim, Seungjun Baek.
"CEmb-SAM: Segment Anything Model with Condition Embedding for Joint Learning from Heterogeneous Datasets." ArXiv (2023). [paper] [2023.08]
CLE Diffusion: Yuyang Yin, Dejia Xu, Chuangchuang Tan, Ping Liu, Yao Zhao, Yunchao Wei.
"CLE Diffusion: Controllable Light Enhancement Diffusion Model." ACM MM (2023). [paper] [code] [2023.08]
Polyp-SAM++: Risab Biswas.
"Polyp-SAM++: Can A Text Guided SAM Perform Better for Polyp Segmentation?" ArXiv (2023). [paper] [code] [2023.08]
TongueSAM: Shan Cao, Qunsheng Ruan, Qingfeng Wu.
"TongueSAM: An Universal Tongue Segmentation Model Based on SAM with Zero-Shot." ArXiv (2023). [paper] [code] [2023.08]
FoodSAM: Xing Lan, Jiayi Lyu, Hanyu Jiang, Kun Dong, Zehai Niu, Yi Zhang, Jian Xue.
"FoodSAM: Any Food Segmentation." ArXiv (2023). [paper] [code] [2023.08]
SAM-L: Xueyuan Li, Ruining Deng, Yucheng Tang, Shunxing Bao, Haichun Yang, Yuankai Huo.
"Leverage Weakly Annotation to Pixel-wise Annotation via Zero-shot Segment Anything Model for Molecular-empowered Learning." ArXiv (2023). [paper] [2023.08]
FAn : Alaa Maalouf, Ninad Jadhav, Krishna Murthy Jatavallabhula, Makram Chahine, Daniel M. Vogt, Robert J. Wood, Antonio Torralba, Daniela Rus.
"Follow Anything: Open-set detection, tracking, and following in real-time." ArXiv (2023). [paper] [code] [demo] [2023.08]
SSOM: Ruikai Cui, Siyuan He, Shi Qiu.
"Adaptive Low Rank Adaptation of Segment Anything to Salient Object Detection." ArXiv (2023). [paper] [2023.08]
AquaSAM: Muduo Xu, Jianhao Su, Yutao Liu.
"AquaSAM: Underwater Image Foreground Segmentation." ArXiv (2023). [paper] [2023.08]
AdaptiveSAM: Jay N. Paranjape, Nithin Gopalakrishnan Nair, Shameema Sikder, S. Swaroop Vedula, Vishal M. Patel.
"AdaptiveSAM: Towards Efficient Tuning of SAM for Surgical Scene Segmentation." ArXiv (2023). [paper] [code] [2023.08]
Ziyi Huang, Hongshan Liu, Haofeng Zhang, Fuyong Xing, Andrew Laine, Elsa Angelini, Christine Hendon, Yu Gan.
"Push the Boundary of SAM: A Pseudo-label Correction Framework for Medical Segmentation." ArXiv (2023). [paper] [2023.08]
Cheng-Yu Hsieh, Si-An Chen, Chun-Liang Li, Yasuhisa Fujii, Alexander Ratner, Chen-Yu Lee, Ranjay Krishna, Tomas Pfister.
"Tool Documentation Enables Zero-Shot Tool-Usage with Large Language Models." ArXiv (2023). [paper] [2023.08]
DEFT: Aditya Kannan.
"Learning from Human Videos for Robotic Manipulation." ArXiv (2023). [paper] [code] [2023.07]
FOCUS : Stefano Ferraro, Pietro Mazzaglia, Tim Verbelen, Bart Dhoedt.
"FOCUS: Object-Centric World Models for Robotics Manipulation." ArXiv (2023). [paper] [code] [2023.07]
DisCo: Tan Wang, Linjie Li, Kevin Lin, Yuanhao Zhai, Chung-Ching Lin, Zhengyuan Yang, Hanwang Zhang, Zicheng Liu, Lijuan Wang.
"DisCo: Disentangled Control for Realistic Human Dance Generation." ArXiv (2023). [paper] [code] [2023.07]
Vaibhav Vavilala, David Forsyth.
"Applying a Color Palette with Local Control using Diffusion Models." ArXiv (2023). [paper] [2023.07]
SegAnimeChara: Andy Yu-Hsiang Tseng, Wen-Fan Wang, Bing-Yu Chen.
"SegAnimeChara: Segmenting Anime Characters Generated by AI." ACM SIGGRAPH (2023). [paper] [2023.07]
TASS: Mengqi He, Jing Zhang, Zhaoyuan Yang, Mingyi He, Nick Barnes, Yuchao Dai.
"Transferable Attack for Semantic Segmentation." ArXiv (2023). [paper] [code] [2023.07]
SAM zero-shot segmentator: Loris Nanni, Carlo Fantozzi, Alberto Pretto , Daniel Fusaro.
"Improving Existing Segmentators Performance with Zero-Shot Segmentators." ArXiv (2023). [paper] [2023.07]
SAMFlow: Shili Zhou, Ruian He, Weimin Tan, Bo Yan.
"SAMFlow: Eliminating Any Fragmentation in Optical Flow with Segment Anything Model." ArXiv (2023). [paper] [2023.07]
HQTrack: Jiawen Zhu, Zhenyu Chen, Zeqi Hao, Shijie Chang, Lu Zhang, Dong Wang, Huchuan Lu, Bin Luo, Jun-Yan He, Jin-Peng Lan, Hanyuan Chen, Chenyang Li.
"Tracking Anything in High Quality." ArXiv (2023). [paper] [code] [2023.07]
Fashion Matrix: Zheng Chong, Xujie Zhang, Fuwei Zhao, Zhenyu Xie, Xiaodan Liang.
"Fashion Matrix: Editing Photos by Just Talking." ArXiv (2023). [paper] [homepage] [code] [2023.07]
RoboChop: Atharva Dikshit, Alison Bartsch, Abraham George, Amir Barati Farimani.
"RoboChop: Autonomous Framework for Fruit and Vegetable Chopping Leveraging Foundational Models." ArXiv (2023). [paper] [2023.07]
Industrial-SA: Keno Moenck, Arne Wendt, Philipp Prünte, Julian Koch, Arne Sahrhage, Johann Gierecker, Ole Schmedemann, Falko Kähler, Dirk Holst, Martin Gomse, Thorsten Schüppstuhl, Daniel Schoepflin.
"Industrial Segment Anything -- a Case Study in Aircraft Manufacturing, Intralogistics, Maintenance, Repair, and Overhaul." ArXiv (2023). [paper] [2023.07]
CNOS: Van Nguyen Nguyen, Tomas Hodan, Georgy Ponimatkin, Thibault Groueix, Vincent Lepetit.
"CNOS: A Strong Baseline for CAD-based Novel Object Segmentation." ICCV Workshop (2023). [paper] [code] [2023.07]
SAM-Path: Jingwei Zhang, Ke Ma, Saarthak Kapse, Joel Saltz, Maria Vakalopoulou, Prateek Prasanna, Dimitris Samaras.
"SAM-Path: A Segment Anything Model for Semantic Segmentation in Digital Pathology." ArXiv (2023). [paper] [2023.07]
BuboGPT: Yang Zhao, Zhijie Lin, Daquan Zhou, Zilong Huang, Jiashi Feng, Bingyi Kang.
"BuboGPT: Enabling Visual Grounding in Multi-Modal LLMs." ArXiv (2023). [paper] [code] [2023.07]
OpenSU: Ruiping Liu, Jiaming Zhang, Kunyu Peng, Junwei Zheng, Ke Cao, Yufan Chen, Kailun Yang, Rainer Stiefelhagen.
"Open Scene Understanding: Grounded Situation Recognition Meets Segment Anything for Helping People with Visual Impairments." ICCV Workshop (2023). [paper] [code] [2023.07]
OG: Zichao Dong, Hang Ji, Weikun Zhang, Xufeng Huang, Junbo Chen.
"OG: Equip vision occupancy with instance segmentation and visual grounding." ArXiv (2023). [paper] [2023.07]
$SAM^{Med}$: Chenglong Wang, Dexuan Li, Sucheng Wang, Chengxiu Zhang, Yida Wang, Yun Liu, Guang Yang.
$SAM^{Med}$: A medical image annotation framework based on large vision model. ArXiv (2023). [paper] [2023.07]
SAM-U: Guoyao Deng, Ke Zou, Kai Ren, Meng Wang, Xuedong Yuan, Sancong Ying, Huazhu Fu.
"SAM-U: Multi-box prompts triggered uncertainty estimation for reliable SAM in medical image." ArXiv (2023). [paper] [2023.07]
Semantic-SAM: Feng Li, Hao Zhang, Peize Sun, Xueyan Zou, Shilong Liu, Jianwei Yang, Chunyuan Li, Lei Zhang, Jianfeng Gao.
"Semantic-SAM: Segment and Recognize Anything at Any Granularity." ArXiv (2023). [paper] [code] [2023.07]
SAM-IQA: Xinpeng Li, Ting Jiang, Haoqiang Fan, Shuaicheng Liu.
"SAM-IQA: Can Segment Anything Boost Image Quality Assessment?." ArXiv (2023). [paper] [code] [2023.07]
Cross-SAM: Xiaoyu Bai, Fan Bai, Xiaofei Huo, Jia Ge, Tony C. W. Mok, Zi Li, Minfeng Xu, Jingren Zhou, Le Lu, Dakai Jin, Xianghua Ye, Jingjing Lu, Ke Yan.
"Matching in the Wild: Learning Anatomical Embeddings for Multi-Modality Images." ArXiv (2023). [paper] [2023.07]
LAM-SC: Feibo Jiang, Yubo Peng, Li Dong, Kezhi Wang, Kun Yang, Cunhua Pan, Xiaohu You.
"Large AI Model-Based Semantic Communications." ArXiv (2023). [paper] [2023.07]
MSDeAOT: Yuanyou Xu, Jiahao Li, Zongxin Yang, Yi Yang, Yueting Zhuang.
"ZJU ReLER Submission for EPIC-KITCHEN Challenge 2023: TREK-150 Single Object Tracking." ArXiv (2023). [paper] [2023.07]
EM-SAM: Ao Cheng, Guoqiang Zhao, Lirong Wang, Ruobing Zhang.
"AxonCallosumEM Dataset: Axon Semantic Segmentation of Whole Corpus Callosum cross section from EM Images." ArXiv (2023). [paper] [2023.07]
SAM-PT: Frano Rajič, Lei Ke, Yu-Wing Tai, Chi-Keung Tang, Martin Danelljan, Fisher Yu.
"Segment Anything Meets Point Tracking." ArXiv (2023). [paper] [code] [2023.07]
SAMAug: Haixing Dai, Chong Ma, Zhengliang Liu, Yiwei Li, Peng Shu, Xiaozheng Wei, Lin Zhao, Zihao Wu, Dajiang Zhu, Wei Liu, Quanzheng Li, Tianming Liu, Xiang Li.
"SAMAug: Point Prompt Augmentation for Segment Anything Model." ArXiv (2023). [paper] [2023.07]
SAM-DA: Liangliang Yao, Haobo Zuo, Guangze Zheng, Changhong Fu, Jia Pan.
"SAM-DA: UAV Tracks Anything at Night with SAM-Powered Domain Adaptation." ArXiv (2023). [paper] [code] [2023.07]
RefSAM: Yonglin Li, Jing Zhang, Xiao Teng, Long Lan.
"RefSAM: Efficiently Adapting Segmenting Anything Model for Referring Video Object Segmentation." ArXiv (2023). [paper] [code] [2023.07]
All-in-SAM: Can Cui, Ruining Deng, Quan Liu, Tianyuan Yao, Shunxing Bao, Lucas W. Remedios, Yucheng Tang, Yuankai Huo.
"All-in-SAM: from Weak Annotation to Pixel-wise Nuclei Segmentation with Prompt-based Finetuning." ArXiv (2023). [paper] [2023.07]
Zenglin Shi, Ying Sun, Mengmi Zhang.
"Training-free Object Counting with Prompts." ArXiv (2023). [paper] [code] [2023.07]
Xiaoyu Shi, Shurong Chai, Yinhao Li, Jingliang Cheng, Jie Bai, Guohua Zhao, Yen-Wei Chen.
"Cross-modality Attention Adapter: A Glioma Segmentation Fine-tuning Method for SAM Using Multimodal Brain MR Images." ArXiv (2023). [paper] [2023.07]
TDA: Ruben Glatt, Shusen Liu.
"Topological Data Analysis Guided Segment Anything Model Prompt Optimization for Zero-Shot Segmentation in Biological Imaging." ArXiv (2023). [paper] [2023.06]
TDA: Ruben Glatt, Shusen Liu.
"Topological Data Analysis Guided Segment Anything Model Prompt Optimization for Zero-Shot Segmentation in Biological Imaging." ArXiv (2023). [paper] [2023.06]
Xavier F. Cadet, Ranya Aloufi, Alain Miranville, Sara Ahmadi-Abhari, Hamed Haddadi.
"Evaluating The Robustness of Self-Supervised Representations to Background/Foreground Removal." ArXiv (2023). [paper] [2023.06]
3D Shape Match: Ahmed Abdelreheem, Abdelrahman Eldesokey, Maks Ovsjanikov, Peter Wonka.
"Zero-Shot 3D Shape Correspondence." SIGGRAPH ASIA (2023). [paper] [code] [2023.06]
Siddharth Shankar, Leigh A. Stearns, Cornelis J. van der Veen.
"Segment Anything in Glaciology: An initial study implementing the Segment Anything Model (SAM)." ArXiv (2023). [paper] [2023.06]
Xiang Li, Lu Zhang, Zihao Wu, Zhengliang Liu, Lin Zhao, Yixuan Yuan, Jun Liu, Gang Li, Dajiang Zhu, Pingkun Yan, Quanzheng Li, Wei Liu, Tianming Liu, Dinggang Shen.
"Artificial General Intelligence for Medical Imaging." ArXiv (2023). [paper] [2023.06]
ViDA: Jiaming Liu, Senqiao Yang, Peidong Jia, Renrui Zhang, Ming Lu, Yandong Guo, Wei Xue, Shanghang Zhang.
"ViDA: Homeostatic Visual Domain Adapter for Continual Test Time Adaptation." ArXiv (2023). [paper] [code] [2023.06]
FGVP: Lingfeng Yang, Yueze Wang, Xiang Li, Xinlong Wang, Jian Yang.
"Fine-Grained Visual Prompting." ArXiv (2023). [paper] [2023.06]
AssistGPT: Difei Gao, Lei Ji, Luowei Zhou, Kevin Qinghong Lin, Joya Chen, Zihan Fan, Mike Zheng Shou.
"AssistGPT: A General Multi-modal Assistant that can Plan, Execute, Inspect, and Learn." ArXiv (2023). [paper] [code] [2023.06]
Matthew Baugh, James Batten, Johanna P. Müller, Bernhard Kainz.
"Zero-Shot Anomaly Detection with Pre-trained Segmentation Models." ArXiv (2023). [paper] [2023.06]
Guochen Ning, Hanyin Liang, Zhongliang Jiang, Hui Zhang, Hongen Liao.
"The potential of 'Segment Anything' (SAM) for universal intelligent ultrasound image guidance." BioScience Trends (2023). [paper] [2023.06]
SeaDronesSee-3D and BOArienT: Benjamin Kiefer, Timon Höfer, Andreas Zell.
"Stable Yaw Estimation of Boats from the Viewpoint of UAVs and USVs." ECMR (2023). [paper] [2023.06]
DADF: Yingxin Lai, Zhiming Luo, Zitong Yu.
"Detect Any Deepfakes: Segment Anything Meets Face Forgery Detection and Localization." ArXiv (2023). [paper] [code] [2023.06]
Lucas Prado Osco, Qiusheng Wu, Eduardo Lopes de Lemos, Wesley Nunes Gonçalves, Ana Paula Marques Ramos, Jonathan Li, José Marcato Junior.
"The Segment Anything Model (SAM) for Remote Sensing Applications: From Zero to One Shot." ArXiv (2023). [paper] [2023.06]
RSPrompter: Keyan Chen, Chenyang Liu, Hao Chen, Haotian Zhang, Wenyuan Li, Zhengxia Zou, Zhenwei Shi.
"RSPrompter: Learning to Prompt for Remote Sensing Instance Segmentation based on Visual Foundation Model." ArXiv (2023). [paper] [code] [2023.06]
Zhewei Chen, Wai Keung Wong, Zuofeng Zhong, Jinpiao Liao, Ying Qu.
"Effective Transfer of Pretrained Large Visual Model for Fabric Defect Segmentation via Specifc Knowledge Injection." ArXiv (2023). [paper] [2023.06]
Zheyan Jin, Shiqi Chen, Yueting Chen, Zhihai Xu, Huajun Feng.
"Let Segment Anything Help Image Dehaze." ArXiv (2023). [paper] [2023.06]
CLIP-SAM: Evan Kellener, Ihina Nath, An Ngo, Thomas Nguyen, Joshua Schuman, Coen Adler, Arnav Kartikeya.
"Utilizing Segment Anything Model For Assessing Localization of GRAD-CAM in Medical Imaging." ArXiv (2023). [paper] [2023.06]
MESS: Benedikt Blumenstiel, Johannes Jakubik, Hilde Kühne, Michael Vössing.
"What a MESS: Multi-Domain Evaluation of Zero-Shot Semantic Segmentation." ArXiv (2023). [paper] [code] [2023.06]
MMPM: Jiange Yang, Wenhui Tan, Chuhao Jin, Bei Liu, Jianlong Fu, Ruihua Song, Limin Wang.
"Pave the Way to Grasp Anything: Transferring Foundation Models for Universal Pick-Place Robots." ArXiv (2023). [paper] [YouTube] [Bilibili] [2023.06]
CellViT: Fabian Hörst, Moritz Rempe, Lukas Heine, Constantin Seibold, Julius Keyl, Giulia Baldini, Selma Ugurel, Jens Siveke, Barbara Grünwald, Jan Egger, Jens Kleesiek.
"CellViT: Vision Transformers for Precise Cell Segmentation and Classification." ArXiv (2023). [paper] [code] [2023.06]
MedLSAM: Wenhui Lei, Xu Wei, Xiaofan Zhang, Kang Li, Shaoting Zhang.
"MedLSAM: Localize and Segment Anything Model for 3D Medical Images." ArXiv (2023). [paper] [code] [2023.06]
MobileSAM: Chaoning Zhang, Dongshen Han, Yu Qiao, Jung Uk Kim, Sung-Ho Bae, Seungkyu Lee, Choong Seon Hong.
"Faster Segment Anything: Towards Lightweight SAM for Mobile Applications." ArXiv (2023). [paper] [code] [2023.06]
SonarSAM: Lin Wang, Xiufen Ye, Liqiang Zhu, Weijie Wu, Jianguo Zhang, Huiming Xing, Chao Hu.
"When SAM Meets Sonar Images." ArXiv (2023). [paper] [code] [2023.06]
AutoSAM: Xinrong Hu, Xiaowei Xu, Yiyu Shi.
"How to Efficiently Adapt Large Segmentation Model(SAM) to Medical Images." ArXiv (2023). [paper] [code] [2023.06]
3DSAM-adapter: Shizhan Gong, Yuan Zhong, Wenao Ma, Jinpeng Li, Zhao Wang, Jingyang Zhang, Pheng-Ann Heng, Qi Dou.
"3DSAM-adapter: Holistic Adaptation of SAM from 2D to 3D for Promptable Medical Image Segmentation." ArXiv (2023). [paper] [code] [2023.06]
Xinru Shan, Chaoning Zhang.
"Robustness of Segment Anything Model (SAM) for Autonomous Driving in Adverse Weather Conditions." ArXiv (2023). [paper] [2023.06]
SAM-LST: Shurong Chai, Rahul Kumar Jain, Shiyu Teng, Jiaqing Liu, Yinhao Li, Tomoko Tateyama, Yen-wei Chen.
"Ladder Fine-tuning approach for SAM integrating complementary network." ArXiv (2023). [paper] [code] [2023.06]
Mohsen Ahmadi, Masoumeh Farhadi Nia, Sara Asgarian, Kasra Danesh, Elyas Irankhah, Ahmad Gholizadeh Lonbar, Abbas Sharifi.
"Comparative Analysis of Segment Anything Model and U-Net for Breast Tumor Detection in Ultrasound and Mammography Images." ArXiv (2023). [paper] [2023.06]
FastSAM: Xu Zhao, Wenchao Ding, Yongqi An, Yinglong Du, Tao Yu, Min Li, Ming Tang, Jinqiao Wang.
"Fast Segment Anything." ArXiv (2023). [paper] [code] [2023.06]
Seal: Youquan Liu, Lingdong Kong, Jun Cen, Runnan Chen, Wenwei Zhang, Liang Pan, Kai Chen, Ziwei Liu.
"Segment Any Point Cloud Sequences by Distilling Vision Foundation Models." ArXiv (2023). [paper] [code] [homepage] [2023.06]
Lian Zhang, Zhengliang Liu, Lu Zhang, Zihao Wu, Xiaowei Yu, Jason Holmes, Hongying Feng, Haixing Dai, Xiang Li, Quanzheng Li, Dajiang Zhu, Tianming Liu, Wei Liu.
"Segment Anything Model (SAM) for Radiation Oncology." ArXiv (2023). [paper] [2023.06]
Enlighten-Anything: Qihan Zhao, Xiaofeng Zhang, Hao Tang, Chaochen Gu, Shanying Zhu.
"Enlighten-anything:When Segment Anything Model Meets Low-light Image Enhancement." ArXiv (2023). [paper] [code] [2023.06]
SAA+: Yunkang Cao, Xiaohao Xu, Chen Sun, Yuqi Cheng, Liang Gao, Weiming Shen.
"Winning Solution for the CVPR2023 Visual Anomaly and Novelty Detection Challenge: Multimodal Prompting for Data-centric Anomaly Detection." CVPR2023 Workshop. [paper] [code] [2023.06]
TEPO: Chuyun Shen, Wenhao Li, Ya Zhang, Xiangfeng Wang.
"Temporally-Extended Prompts Optimization for SAM in Interactive Medical Image Segmentation." ArXiv (2023). [paper] [2023.06]
TomoSAM: Federico Semeraro, Alexandre Quintart, Sergio Fraile Izquierdo, Joseph C. Ferguson.
"TomoSAM: a 3D Slicer extension using SAM for tomography segmentation." ArXiv (2023). [paper] [code] [2023.06]
Madeline Chantry Schiappa, Sachidanand VS, Yunhao Ge, Ondrej Miksik, Yogesh S. Rawat, Vibhav Vineet.
"Robustness Analysis on Foundational Segmentation Models." ArXiv (2023). [paper] [code] [2023.06]
Yu Qiao, Chaoning Zhang, Taegoo Kang, Donghun Kim, Shehbaz Tariq, Chenshuang Zhang, Choong Seon Hong.
"Robustness of SAM: Segment Anything Under Corruptions and Beyond." ArXiv (2023). [paper] [2023.06]
AutoSAM: Tal Shaharabany, Aviad Dahan, Raja Giryes, Lior Wolf.
"AutoSAM: Adapting SAM to Medical Images by Overloading the Prompt Encoder." ArXiv (2023). [paper] [2023.06]
SAM-shadow: Xiaofeng Zhang, Chaochen Gu, Shanying Zhu.
"SAM-helps-Shadow:When Segment Anything Model meet shadow removal." ArXiv (2023). [paper] [code] [2023.06]
Chaoning Zhang, Sheng Zheng, Chenghao Li, Yu Qiao, Taegoo Kang, Xinru Shan, Chenshuang Zhang, Caiyan Qin, Francois Rameau, Sung-Ho Bae, Choong Seon Hong.
"A Survey on Segment Anything Model (SAM): Vision Foundation Model Meets Prompt Engineering." ArXiv (2023). [paper] [2023.06]
MAM: Jiachen Li, Jitesh Jain, Humphrey Shi.
"Matting Anything." ArXiv (2023). [paper] [code] [2023.06]
Haochen Xue, Mingyu Jin, Chong Zhang, Yuxuan Huang, Qian Weng, Xiaobo Jin.
"Automatic Image Blending Algorithm Based on SAM and DINO." ArXiv (2023). [paper] [2023.06]
MatAny: Jingfeng Yao, Xinggang Wang, Lang Ye, Wenyu Liu.
"Matte Anything: Interactive Natural Image Matting with Segment Anything Models." ArXiv (2023). [paper] [code] [2023.06]
CNS: Runnan Chen, Youquan Liu, Lingdong Kong, Nenglun Chen, Xinge Zhu, Yuexin Ma, Tongliang Liu, Wenping Wang.
"Towards Label-free Scene Understanding by Vision Foundation Models." ArXiv (2023). [paper] [code] [2023.06]
SAM3D: Yunhan Yang, Xiaoyang Wu, Tong He, Hengshuang Zhao, Xihui Liu.
"SAM3D: Segment Anything in 3D Scenes." ArXiv (2023). [paper] [code] [2023.06]
Calib-Anything: Zhaotong Luo, Guohang Yan, Yikang Li.
" Calib-Anything: Zero-training LiDAR-Camera Extrinsic Calibration Method Using Segment Anything." ArXiv (2023). [paper] [code] [2023.06]
Shijie Chang, Zeqi Hao, Ben Kang, Xiaoqi Zhao, Jiawen Zhu, Zhenyu Chen, Lihe Zhang, Lu Zhang, Huchuan Lu.
" 3rd Place Solution for PVUW2023 VSS Track: A Large Model for Semantic Segmentation on VSPW." ArXiv (2023). [paper] [2023.06]
USD: Yulin He, Wei Chen, Yusong Tan, Siqi Wang.
" USD: Unknown Sensitive Detector Empowered by Decoupled Objectness and Segment Anything Model." ArXiv (2023). [paper] [2023.06]
SAM3D: Dingyuan Zhang, Dingkang Liang, Hongcheng Yang, Zhikang Zou, Xiaoqing Ye, Zhe Liu, Xiang Bai.
"SAM3D: Zero-Shot 3D Object Detection via Segment Anything Model." ArXiv (2023). [paper] [code] [2023.06]
Shehbaz Tariq, Brian Estadimas Arfeto, Chaoning Zhang, Hyundong Shin.
"Segment Anything Meets Semantic Communication." ArXiv (2023). [paper] [2023.06]
HQ-SAM: Lei Ke, Mingqiao Ye, Martin Danelljan, Yifan Liu, Yu-Wing Tai, Chi-Keung Tang, Fisher Yu.
"Segment Anything in High Quality." NeurIPS (2023). [paper] [code] [2023.06]
DeSAM: Yifan Gao, Wei Xia, Dingdu Hu, Xin Gao.
"DeSAM: Decoupling Segment Anything Model for Generalizable Medical Image Segmentation." ArXiv (2023). [paper] [code] [2023.06]
NP-SAM: Rasmus Larsen, Torben L. Villadsen, Jette K. Mathiesen, Kirsten M. Ø. Jensen, Espen D. Bøjesen.
"NP-SAM: Implementing the Segment Anything Model for Easy Nanoparticle Segmentation in Electron Microscopy Images." ArXiv (2023). [paper] [code] [2023.05]
EfficientViT: Han Cai, Junyan Li, Muyan Hu, Chuang Gan, Song Han.
"EfficientViT: Multi-Scale Linear Attention for High-Resolution Dense Prediction." ICCV (2023). [paper] [code] [2023.05]
POPE: Zhiwen Fan, Panwang Pan, Peihao Wang, Yifan Jiang, Dejia Xu, Hanwen Jiang, Zhangyang Wang.
"POPE: 6-DoF Promptable Pose Estimation of Any Object, in Any Scene, with One Reference." ArXiv (2023). [paper] [code] [2023.05]
Bridge3D: Zhimin Chen, Bing Li.
"Bridging the Domain Gap: Self-Supervised 3D Scene Understanding with Foundation Models." ArXiv (2023). [paper] [2023.05]
Make-A-Protagonist: Yuyang Zhao, Enze Xie, Lanqing Hong, Zhenguo Li, Gim Hee Lee.
"Make-A-Protagonist: Generic Video Editing with An Ensemble of Experts." ArXiv (2023). [paper] [code] [2023.05]
ZeroPose: Jianqiu Chen, Mingshan Sun, Tianpeng Bao, Rui Zhao, Liwei Wu, Zhenyu He.
"ZeroPose: CAD-Model-based Zero-Shot Pose Estimation." ArXiv (2023). [paper] [2023.05]
IIR-Net: Zhongping Zhang, Jian Zheng, Jacob Zhiyuan Fang, Bryan A. Plummer.
"Text-to-image Editing by Image Information Removal." ArXiv (2023). [paper] [2023.05]
Chaoning Zhang, Yu Qiao, Shehbaz Tariq, Sheng Zheng, Chenshuang Zhang, Chenghao Li, Hyundong Shin, Choong Seon Hong.
"Understanding segment anything model: Sam is biased towards texture rather than shape." ArXiv (2023). [paper] [2023.05]
FineRewards: Guian Fang, Zutao Jiang, Jianhua Han, Guangsong Lu, Hang Xu, Xiaodan Liang.
"Boosting Text-to-Image Diffusion Models with Fine-Grained Semantic Rewards." ArXiv (2023). [paper] [code] [2023.05]
InstructEdit: Qian Wang, Biao Zhang, Michael Birsak, Peter Wonka.
"InstructEdit: Improving Automatic Masks for Diffusion-based Image Editing With User Instructions." ArXiv (2023). [paper] [code] [2023.05]
AIMS: Lu Qi, Jason Kuen, Weidong Guo, Jiuxiang Gu, Zhe Lin, Bo Du, Yu Xu, Ming-Hsuan Yang.
"AIMS: All-Inclusive Multi-Level Segmentation." ArXiv (2023). [paper] [code] [2023.05]
ShadowSAM: Yonghui Wang, Wengang Zhou, Yunyao Mao, Houqiang Li.
"Detect Any Shadow: Segment Anything for Video Shadow Detection." ArXiv (2023). [paper] [code] [2023.05]
ISA-NeRF: Xiaokang Chen, Jiaxiang Tang, Diwen Wan, Jingbo Wang, Gang Zeng.
"Interactive Segment Anything NeRF with Feature Imitation." ArXiv (2023). [paper] [homepage] [2023.05]
Yihao Huang, Yue Cao, Tianlin Li, Felix Juefei-Xu, Di Lin, Ivor W. Tsang, Yang Liu, Qing Guo.
"On the Robustness of Segment Anything." ArXiv (2023). [paper] [2023.05]
SAMScore: Yunxiang Li, Meixu Chen, Wenxuan Yang, Kai Wang, Jun Ma, Alan C. Bovik, You Zhang.
"SAMScore: A Semantic Structural Similarity Metric for Image Translation Evaluation." ArXiv (2023). [paper] [code] [2023.05]
SAD: Jun Cen, Yizheng Wu, Kewei Wang, Xingyi Li, Jingkang Yang, Yixuan Pei, Lingdong Kong, Ziwei Liu, Qifeng Chen.
"SAD: Segment Any RGBD." ArXiv (2023). [paper] [code] [2023.05]
SPT: Zeyu Xiao, Jiawang Bai, Zhihe Lu, Zhiwei Xiong.
"A Dive into SAM Prior in Image Restoration." ArXiv (2023). [paper] [2023.05]
Matcher: Yang Liu, Muzhi Zhu, Hengtao Li, Hao Chen, Xinlong Wang, Chunhua Shen.
"Matcher: Segment Anything with One Shot Using All-Purpose Feature Matching." ArXiv (2023). [paper] [code] [2023.05]
RAP: Jiaxi Jiang, Christian Holz.
"Restore Anything Pipeline: Segment Anything Meets Image Restoration." ArXiv (2023). [paper] [code] [2023.05]
UVOSAM: Zhenghao Zhang, Zhichao Wei, Shengfan Zhang, Zuozhuo Dai, Siyu Zhu.
"UVOSAM: A Mask-free Paradigm for Unsupervised Video Object Segmentation via Segment Anything Model." ArXiv (2023). [paper] [2023.05]
BreastSAM: Mingzhe Hu, Yuheng Li, Xiaofeng Yang.
"BreastSAM: A Study of Segment Anything Model for Breast Tumor Detection in Ultrasound Images." ArXiv (2023). [paper] [2023.05]
SAMSh: Leiping Jie, Hui Zhang.
"When SAM Meets Shadow Detection." ArXiv (2023). [paper] [code] [2023.05]
Instruct2Act: Siyuan Huang, Zhengkai Jiang, Hao Dong, Yu Qiao, Peng Gao, Hongsheng Li.
"Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions with Large Language Model." ArXiv (2023). [paper] [code] [2023.05]
WS-SAM: Chunming He, Kai Li, Yachao Zhang, Guoxia Xu, Longxiang Tang, Yulun Zhang, Zhenhua Guo, Xiu Li.
"Weakly-Supervised Concealed Object Segmentation with SAM-based Pseudo Labeling and Multi-scale Feature Grouping." NeurIPS (2023). [paper] [2023.05]
SAA+: Yunkang Cao, Xiaohao Xu, Chen Sun, Yuqi Cheng, Zongwei Du, Liang Gao, Weiming Shen.
"Segment Any Anomaly without Training via Hybrid Prompt Regularization." ArXiv (2023). [paper] [code] [2023.05]
OR-NeRF: Youtan Yin, Zhoujie Fu, Fan Yang, Guosheng Lin.
"OR-NeRF: Object Removing from 3D Scenes Guided by Multiview Segmentation with Neural Radiance Fields." ArXiv (2023). [paper] [2023.05]
PromptUNet: Junde Wu.
"PromptUNet: Toward Interactive Medical Image Segmentation." ArXiv (2023). [paper] [code] [2023.05]
EAC: Ao Sun, Pingchuan Ma, Yuanyuan Yuan, Shuai Wang.
"Explain Any Concept: Segment Anything Meets Concept-Based Explanation." NeurIPS (2023). [paper] [2023.05]
Xiao Yang, Haixing Dai, Zihao Wu, Ramesh Bist, Sachin Subedi, Jin Sun, Guoyu Lu, Changying Li, Tianming Liu, Lilong Chai.
"SAM for Poultry Science." ArXiv (2023). [paper] [2023.05]
Leaf Only SAM: Dominic Williams, Fraser MacFarlane, Avril Britten.
"Leaf Only SAM: A Segment Anything Pipeline for Zero-Shot Automated Leaf Segmentation." ArXiv (2023). [paper] [2023.05]
KD-SAM: Sahib Julka, Michael Granitzer.
"Knowledge distillation with Segment Anything (SAM) model for Planetary Geological Mapping." ArXiv (2023). [paper] [2023.05]
SAM-Track: Yangming Cheng, Liulei Li, Yuanyou Xu, Xiaodi Li, Zongxin Yang, Wenguan Wang, Yi Yang.
"Segment-and-Track Anything." ArXiv (2023). [paper] [code] [2023.05]
SEEM: Zhihe Lu, Zeyu Xiao, Jiawang Bai, Zhiwei Xiong, Xinchao Wang.
"Can SAM Boost Video Super-Resolution?" ArXiv (2023). [paper] [2023.05]
Yuqing Wang, Yun Zhao, Linda Petzold.
"An Empirical Study on the Robustness of the Segment Anything Model (SAM)." ArXiv (2023). [paper] [2023.05]
SAM-WSSS: Tianle Chen, Zheda Mai, Ruiwen Li, Wei-lun Chao.
"Segment Anything Model (SAM) Enhanced Pseudo Labels for Weakly Supervised Semantic Segmentation." ArXiv (2023). [paper] [code] [2023.05]
SAM4MIS: Yichi Zhang, Rushi Jiao.
"How Segment Anything Model (SAM) Boost Medical Image Segmentation?" ArXiv (2023). [paper] [code] [2023.05]
BadSAM: Zihan Guan, Mengxuan Hu, Zhongliang Zhou, Jielu Zhang, Sheng Li, Ninghao Liu.
"BadSAM: Exploring Security Vulnerabilities of SAM via Backdoor Attacks." ArXiv (2023). [paper] [2023.05]
PerSAM: Renrui Zhang, Zhengkai Jiang, Ziyu Guo, Shilin Yan, Junting Pan, Hao Dong, Peng Gao, Hongsheng Li.
"Personalize Segment Anything Model with One Shot." ArXiv (2023). [paper] [code] [2023.05]
CAT: Teng Wang, Jinrui Zhang, Junjie Fei, Hao Zheng, Yunlong Tang, Zhe Li, Mingqi Gao, Shanshan Zhao.
"Caption Anything: Interactive Image Description with Diverse Multimodal Controls." ArXiv (2023). [paper] [code] [2023.05]
SAMRS: Di Wang, Jing Zhang, Bo Du, Dacheng Tao, Liangpei Zhang.
"Scaling-up Remote Sensing Segmentation Dataset with Segment Anything Model." NeurIPS 2023 Datasets and Benchmarks Track (2023). [paper] [code] [2023.05]
AV-SAM: Shentong Mo, Yapeng Tian.
"AV-SAM: Segment Anything Model Meets Audio-Visual Localization and Segmentation." ArXiv (2023). [paper] [2023.05]
SAMA-AVS: Jinxiang Liu, Yu Wang, Chen Ju, Chaofan Ma, Ya Zhang, Weidi Xie.
"Annotation-free Audio-Visual Segmentation." WACV (2024). [paper] [code] [2023.05]
WSSS: Weixuan Sun, Zheyuan Liu, Yanhao Zhang, Yiran Zhong, Nick Barnes.
"An Alternative to WSSS? An Empirical Study of the Segment Anything Model (SAM) on Weakly-Supervised Semantic Segmentation Problems." ArXiv (2023). [paper] [2023.05]
PLG-SAM: Peng-Tao Jiang, Yuqi Yang.
"Segment Anything is A Good Pseudo-label Generator for Weakly Supervised Semantic Segmentation." ArXiv (2023). [paper] [2023.05]
Attack-SAM: Chenshuang Zhang, Chaoning Zhang, Taegoo Kang, Donghun Kim, Sung-Ho Bae, In So Kweon.
"Attack-SAM: Towards Attacking Segment Anything Model With Adversarial Examples." ArXiv (2023). [paper] [2023.05]
Polyp-SAM: Yuheng Li, Mingzhe Hu, Xiaofeng Yang.
"Polyp-SAM: Transfer SAM for Polyp Segmentation." ArXiv (2023). [paper] [code] [2023.05]
Dongsheng Han, Chaoning Zhang, Yu Qiao, Maryam Qamar, Yuna Jung, SeungKyu Lee, Sung-Ho Bae, Choong Seon Hong.
"Segment Anything Model (SAM) Meets Glass: Mirror and Transparent Objects Cannot Be Easily Detected." ArXiv (2023). [paper] [2023.05]
DSEC-MOS: Zhuyun Zhou, Zongwei Wu, Rémi Boutteau, Fan Yang, Dominique Ginhac.
"DSEC-MOS: Segment Any Moving Object with Moving Ego Vehicle." ArXiv (2023). [paper] [code] [2023.05]
Christian Mattjie, Luis Vinicius de Moura, Rafaela Cappelari Ravazio, Lucas Silveira Kupssinskü, Otávio Parraga, Marcelo Mussi Delucis, Rodrigo Coelho Barros.
"Zero-shot performance of the Segment Anything Model (SAM) in 2D medical imaging: A comprehensive evaluation and practical guidelines." ArXiv (2023). [paper] [code] [2023.05]
Dongjie Cheng, Ziyuan Qin, Zekun Jiang, Shaoting Zhang, Qicheng Lao, Kang Li.
"SAM on Medical Images: A Comprehensive Study on Three Prompt Modes." ArXiv (2023). [paper] [2023.05]
Expedit-SAM: Weicong Liang, Yuhui Yuan, Henghui Ding, Xiao Luo, Weihong Lin, Ding Jia, Zheng Zhang, Chao Zhang, Han Hu.
"Expediting Large-Scale Vision Transformer for Dense Prediction without Fine-tuning." NeurIPS (2022). [paper] [code] [2023.04]
An Wang, Mobarakol Islam, Mengya Xu, Yang Zhang, Hongliang Ren.
"SAM Meets Robotic Surgery: An Empirical Study in Robustness Perspective." ArXiv (2023). [paper] [2023.04]
Yuhao Huang, Xin Yang, Lian Liu, Han Zhou, Ao Chang, Xinrui Zhou, Rusi Chen, Junxuan Yu, Jiongquan Chen, Chaoyu Chen, Haozhe Chi, Xindi Hu, Deng-Ping Fan, Fajin Dong, Dong Ni.
"Segment Anything Model for Medical Images?" ArXiv (2023). [paper] [2023.04]
Edit Everything: Defeng Xie, Ruichen Wang, Jian Ma, Chen Chen, Haonan Lu, Dong Yang, Fobo Shi, Xiaodong Lin.
"Edit Everything: A Text-Guided Generative System for Images Editing." ArXiv (2023). [paper] [code] [2023.04]
SkinSAM: Mingzhe Hu, Yuheng Li, Xiaofeng Yang.
"SkinSAM: Empowering Skin Cancer Segmentation with Segment Anything Model." ArXiv (2023). [paper] [2023.04]
GazeSAM: Bin Wang, Armstrong Aboah, Zheyuan Zhang, Ulas Bagci.
"GazeSAM: What You See is What You Segment." ArXiv (2023). [paper] [code] [2023.04]
SAMed: Kaidong Zhang, Dong Liu.
"Customized Segment Anything Model for Medical Image Segmentation." ArXiv (2023). [paper] [code] [2023.04]
LearnablePromptSAM: Zhongxi Qiu, Yan Hu, Heng Li, Jiang Liu.
"Learnable Ophthalmology SAM." ArXiv (2023). [paper] [code] [2023.04]
Simiao Ren, Francesco Luzi, Saad Lahrichi, Kaleb Kassaw, Leslie M. Collins, Kyle Bradbury, Jordan M. Malof.
"Segment anything, from space?." WACV (2024). [paper] [2023.04]
Peilun Shi, Jianing Qiu, Sai Mu Dalike Abaxi, Hao Wei, Frank P. -W. Lo, Wu Yuan.
"Generalist Vision Foundation Models for Medical Imaging: A Case Study of Segment Anything Model on Zero-Shot Medical Segmentation." ArXiv (2023). [paper] [2023.04]
MSA: Junde Wu, Yu Zhang, Rao Fu, Huihui Fang, Yuanpei Liu, Zhaowei Wang, Yanwu Xu, Yueming Jin.
"Medical SAM Adapter: Adapting Segment Anything Model for Medical Image Segmentation." ArXiv (2023). [paper] [code] [2023.04]
Mohsen Ahmadi, Ahmad Gholizadeh Lonbar, Abbas Sharifi, Ali Tarlani Beris, Mohammadsadegh Nouri, Amir Sharifzadeh Javidi.
"Application of Segment Anything Model for Civil Infrastructure Defect Assessment." ArXiv (2023). [paper] [2023.04]
SA3D: Jiazhong Cen, Zanwei Zhou, Jiemin Fang, Wei Shen, Lingxi Xie, Dongsheng Jiang, Xiaopeng Zhang, Qi Tian.
"Segment Anything in 3D with NeRFs." NeurIPS (2023). [paper] [code] [2023.04]
MedSAM: Jun Ma, Bo Wang.
"Segment Anything in Medical Images." ArXiv (2023). [paper] [code] [2023.04]
TAM: Jinyu Yang, Mingqi Gao, Zhe Li, Shang Gao, Fangjing Wang, Feng Zheng.
"Track Anything: Segment Anything Meets Videos." ArXiv (2023). [paper] [code] [2023.04]
HFGFA: Rongsheng Wang, Yaofei Duan, YuKun Li.
"Segment anything also detect anything." ArXiv (2023). [paper] [2023.04]
SNA: Yongcheng Jing, Xinchao Wang, Dacheng Tao.
"Segment Anything in Non-Euclidean Domains: Challenges and Opportunities." ArXiv (2023). [paper] [2023.04]
SAMAug: Yizhe Zhang, Tao Zhou, Peixian Liang, Danny Z. Chen.
"Input Augmentation with SAM: Boosting Medical Image Segmentation with Segmentation Foundation Model." ArXiv (2023). [paper] [2023.04]
Count-Anything: Zhiheng Ma, Xiaopeng Hong, Qinnan Shangguan.
"Can SAM Count Anything? An Empirical Study on SAM Counting." ArXiv (2023). [paper] [code] [2023.04]
Text2Seg: Jielu Zhang, Zhongliang Zhou, Gengchen Mai, Lan Mu, Mengxuan Hu, Sheng Li.
"Text2Seg: Remote Sensing Image Semantic Segmentation via Text-Guided Visual Foundation Models." ArXiv (2023). [paper] [code] [2023.04]
Maciej A. Mazurowski, Haoyu Dong, Hanxue Gu, Jichen Yang, Nicholas Konz, Yixin Zhang.
"Segment Anything Model for Medical Image Analysis: an Experimental Study." MIA (2023). [paper] [2023.04]
Anything-3D: Qiuhong Shen, Xingyi Yang, Xinchao Wang.
"Anything-3D: Towards Single-view Anything Reconstruction in the Wild." ArXiv (2023). [paper] [code] [2023.04]
Any-to-Any Transfer: Songhua Liu, Jingwen Ye, Xinchao Wang.
"Any-to-Any Style Transfer: Making Picasso and Da Vinci Collaborate." ArXiv (2023). [paper] [code] [2023.04]
Sheng He, Rina Bao, Jingpeng Li, Jeffrey Stout, Atle Bjornerud, P. Ellen Grant, Yangming Ou.
"Computer-Vision Benchmark Segment-Anything Model (SAM) in Medical Images: Accuracy in 12 Datasets." ArXiv (2023). [paper] [2023.04]
SAM-Adapter: Tianrun Chen, Lanyun Zhu, Chaotao Ding, Runlong Cao, Yan Wang, Zejian Li, Lingyun Sun, Papa Mao, Ying Zang.
"SAM Fails to Segment Anything? -- SAM-Adapter: Adapting SAM in Underperformed Scenes: Camouflage, Shadow, Medical Image Segmentation, and More." ArXiv (2023). [paper] [2023.04]
Chuanfei Hu, Tianyi Xia, Shenghong Ju, Xinde Li.
"When SAM Meets Medical Images: An Investigation of Segment Anything Model (SAM) on Multi-phase Liver Tumor Segmentation." ArXiv (2023). [paper] [2023.04]
SATIR: Junzhang Chen, Xiangzhi Bai.
"Learning to "Segment Anything" in Thermal Infrared Images through Knowledge Distillation with a Large Scale Dataset SATIR." ArXiv (2023). [paper] [code] [2023.04]
Florian Putz, Johanna Grigo, Thomas Weissmann, Philipp Schubert, Daniel Hoefler, Ahmed Gomaa, Hassen Ben Tkhayat, Amr Hagag, Sebastian Lettmaier, Benjamin Frey, Udo S. Gaipl, Luitpold V. Distel, Sabine Semrau, Christoph Bert, Rainer Fietkau, Yixing Huang.
"The Segment Anything foundation model achieves favorable brain tumor autosegmentation accuracy on MRI to support radiotherapy treatment planning." ArXiv (2023). [paper] [2023.04]
Iraklis Giannakis, Anshuman Bhardwaj, Lydia Sam, Georgios Leontidis.
"Deep learning universal crater detection using Segment Anything Model (SAM)." ArXiv (2023). [paper] [2023.04]
SAMPolyp: Tao Zhou, Yizhe Zhang, Yi Zhou, Ye Wu, Chen Gong.
"Can SAM Segment Polyps?" ArXiv (2023). [paper] [code] [2023.04]
Inpaint-Anything: Tao Yu, Runseng Feng, Ruoyu Feng, Jinming Liu, Xin Jin, Wenjun Zeng, Zhibo Chen.
"Inpaint Anything: Segment Anything Meets Image Inpainting." ArXiv (2023). [paper] [code] [2023.04]
Ge-Peng Ji, Deng-Ping Fan, Peng Xu, Ming-Ming Cheng, Bowen Zhou, Luc Van Gool.
" SAM Struggles in Concealed Scenes -- Empirical Study on "Segment Anything"." ArXiv (2023). [paper] [2023.04]
Wei Ji, Jingjing Li, Qi Bi, Wenbo Li, Li Cheng.
"Segment Anything Is Not Always Perfect: An Investigation of SAM on Different Real-world Applications." ArXiv (2023). [paper] [2023.04]
CLIP Surgery: Yi Li, Hualiang Wang, Yiqun Duan, Xiaomeng Li.
"CLIP Surgery for Better Explainability with Enhancement in Open-Vocabulary Tasks." ArXiv (2023). [paper] [code] [2023.04]
SAMM: Yihao Liu, Jiaming Zhang, Zhangcong She, Amir Kheradmand, Mehran Armand.
"SAMM (Segment Any Medical Model): A 3D Slicer Integration to SAM." ArXiv (2023). [paper] [code] [2023.04]
SAM.MD: Saikat Roy, Tassilo Wald, Gregor Koehler, Maximilian R. Rokuss, Nico Disch, Julius Holzschuh, David Zimmerer, Klaus H. Maier-Hein.
"SAM.MD: Zero-shot medical image segmentation capabilities of the Segment Anything Model." ArXiv (2023). [paper] [2023.04]
SAM vs BET: Sovesh Mohapatra, Advait Gosai, Gottfried Schlaug.
"SAM vs BET: A Comparative Study for Brain Extraction and Segmentation of Magnetic Resonance Images using Deep Learning." ArXiv (2023). [paper] [2023.04]
SAMCOD: Lv Tang, Haoke Xiao, Bo Li.
"Can SAM Segment Anything? When SAM Meets Camouflaged Object Detection." ArXiv (2023). [paper] [code] [2023.04]

Open Source Projects

No.	Project	Title	Project page	Code base	Affiliation	Description
001	SAM	Segment Anything	Project page	Code	Meta	A foundation model for general segmentation.
002	SAM-Track	Segment and Track Anything	Colab	Code	Zhejiang University	A project dedicated to tracking and segmenting any objects in videos, either automatically or interactively.
003	Grounded-SAM	Grounded-Segment-Anything	Colab	Code	IDEA-Research	A project by combining Grounding DINO and SAM which aims to detect and segment Anything with text inputs.
004	MMDet-SAM	-	-	Code	OpenMMLab	A new way of instance segmentation by combining SAM with Closed-Set Object Detection, Open-Vocabulary Object Detection, Grounding Object Detection.
005	MMRotate-SAM	Zero-shot Oriented Object Detection with SAM	-	Code	OpenMMLab	A project join SAM and weakly supervised horizontal box detection to achieve rotated box detection.
006	MMOCR-SAM	-	-	Code	OpenMMLab	A solution of Text Detection/Recognition and SAM that segments every text character, with striking text removal and text inpainting demos driven by diffusion models and Gradio.
007	MMEditing-SAM	-	-	Code	OpenMMLab	A project join SAM and image generation to create awesome images and edit any part of them.
008	Label-Studio-SAM	OpenMMLab PlayGround: Semi-Automated Annotation with Label-Studio and SAM	-	Code	OpenMMLab	A project combining Label-Studio and SAM to achieve semi-automated annotation.
009	PaddleSeg	Segment Anything with PaddleSeg	-	Code	PaddlePaddle	A pretrained model parameters of PaddlePaddle format.
010	SegGPT	Segmenting Everything In Context	Hugging Face	Code	BAAI-Vision	SAM In Context based on Painter.
011	SEEM	Segment Everything Everywhere All at Once	Hugging Face	Code	Microsoft	A project can Segment Everything Everywhere with Multi-modal prompts all at once.
012	CLIP Surgery	CLIP Surgery for Better Explainability with Enhancement in Open Vocabulary Tasks	Project page	Code	HKUST	A work about SAM based on CLIP's explainability to achieve text to mask without manual points.
013	SAMCOD	Can SAM Segment Anything? When SAM Meets Camouflaged Object Detection	-	Code	-	SAM +Camouflaged object detection (COD) task.
014	Inpaint Anything	Segment Anything Meets Image Inpainting	Hugging Face	Code	USTC and EIT	SAM combines Inpainting, which is able to remove the object smoothly.
015	PerSAM	Personalize Segment Anything Model with One Shot	Hugging Face	Code	-	SAM with specific concepts.
016	MedSAM	Segment Anything in Medical Images	-	Code	-	A step-by-step tutorial with a small dataset to help you quickly utilize SAM.
017	Segment-Any-Anomaly	GroundedSAM Anomaly Detection	Colab	Code	HUST	Grounding DINO + SAM to segment any anomaly.
018	SSA	Semantic Segment Anything	-	Code	Fudan University	A dense category annotation engine.
019	Magic Copy	-	-	Code	-	Magic Copy is a Chrome extension that uses SAM to extract a foreground object from an image and copy it to the clipboard.
020	Segment Anything with Clip	Segment Anything with Clip	Hugging Face	Code	-	SAM combined with CLIP.
021	MetaSeg	Segment Anything Video	Hugging Face	Code	-	Packaged version of the SAM.
022	SAM in Napari	Segment Anything Model (SAM) in Napari	Project page	Code	Applied Computer Vision Lab and German Cancer Research Center	Extended SAM's click-based foreground separation to full click-based semantic segmentation and instance segmentation.
023	SAM Medical Imaging	SAM Medical Imaging	-	Code	-	SAM for Medical Imaging.
024	3D-Box	3D-Box via Segment Anything	-	Code	-	SAM is extended to 3D perception by combining it with VoxelNeXt.
025	Anything-3D	-	-	Code	-	Anything 3DNovel View, Anything-NeRF, Any 3DFace.
026	L2SET	Learning to Segment EveryThing	-	Code	UC Berkeley, FAIR	A new partially supervised training paradigm for instance segmentation.
027	Edit Anything	Edit Anything by Segment-Anything	-	Code	-	Edit anything in images powered by SAM, ControlNet, StableDiffusion, \etc.
028	Image Edit Anything	IEA: Image Editing Anything	-	Code	-	Using stable diffusion and SAM for image editing.
029	SAM for Stable Diffusion Webui	Segment Anything for Stable Diffusion WebUI	-	Code	-	This extension aim for connecting AUTOMATIC1111 Stable Diffusion WebUI and Mikubill ControlNet Extension with SAM and GroundingDINO to enhance Stable Diffusion/ControlNet inpainting.
030	Earth Observation Tools	Segment Anything EO tools	Colab	Code	-	An earth observation tools for SAM.
031	Moving Object Detection	Towards Segmenting Anything That Moves	-	Code	-	A project about SAM + Moving Object Detection.
032	OCR-SAM	Optical Character Recognition with Segment Anything	Project page	Code	-	Combining MMOCR with SAM and Stable Diffusion.
033	SALT	Segment Anything Labelling Tool	-	Code	-	A project uses the SAM Model and adds a barebones interface to label images and saves the masks in the COCO format.
034	Prompt Segment Anything	Prompt Segment Anything	-	Code	-	An implementation of zero-shot instance segmentation using SAM.
035	SAM-RBox	-	-	Code	-	A project uses SAM for generating rotated bounding boxes with MMRotate, which is a comparison method of H2RBox-v2.
036	VISAM	MOTRv2: Bootstrapping End-to-End Multi-Object Tracking by Pretrained Object Detectors	-	Code	-	Combining SAM with MOT, it create the era of "MOTS".
037	SegEO	Segment Anything EO tools	-	Code	-	The tools are developed to ease the processing of spatial data (GeoTIFF and TMS) with SAM using sliding window algorithm for big files.
038	Napari Segment Anything	Napari Segment Anything	Project page	Code	-	SAM native Qt UI.
039	Segment-Anything-U-Specify	Segment-Anything-U-Specify	-	Code	-	Using CLIP and SAM to segment any instance you specify with text prompt of any instance names.
040	SegDrawer	Simple static web-based mask drawer	Colab	Code	-	Simple static web-based mask drawer, supporting semantic segmentation with SAM.
041	Track Anything	Segment Anything Meets Videos	Hugging Face	Code	SUSTech	Track-Anything is a flexible and interactive tool for video object tracking and segmentation.
042	Count Anything	-	-	Code	-	A method uses SAM and CLIP to ground and count any object that matches a custom text prompt, without requiring any point or box annotation.
043	RAM	Relate Anything Model	Hugging Face	Code	MMLab, NTU and VisCom Lab, KCL/TongJi	Relate Anything Model is capable of taking an image as input and utilizing SAM to identify the corresponding mask within the image.
044	Segment Any RGBD	Segment Any RGBD	Project page	Code	-	Segment AnyRGBD is a toolbox to segment rendered depth images based on SAM.
045	Show Anything	Show Anything	Hugging Face	Code	Showlab, NUS	Some Applications that are compatible with both SAM and Generation.
046	Transfer Any Style	Any-to-Any Style Transfer: Making Picasso and Da Vinci Collaborate	-	Code	LV-lab, NUS	An interactive demo based on Segment-Anything for style transfer which enables different content regions apply different styles.
047	Caption Anything	-	Colab	Code	VIP lab, SUSTech	Caption-Anything is a versatile image processing tool that combines the capabilities of SAM, Visual Captioning, and ChatGPT.
048	Image2Paragraph	Transform Image Into Unique Paragraph	Project page	Code	-	Transform Image into Unique Paragraph with ChatGPT, BLIP2, OFA, GRIT, Segment Anything, ControlNet.
049	LIME SAM	Local Interpretable Model-agnostic Explanations Segment Anything	Colab	Code	-	LIME-SAM aims to create an Explainable Artificial Intelligence (XAI) framework for image classification using LIME (Local Interpretable Model-agnostic Explanations) as the base algorithm, with the super-pixel method replaced by SAM.
050	Paint Anything	-	-	Code	-	An interactive demo based on SAM for stroke-based painting which enables human-like painting.
051	SAMed	Customized Segment Anything Model for Medical Image Segmentation	Colab	Code	USTC	SAMed is built upon the large-scale image segmentation model, SAM, to explore the new research paradigm of customizing large-scale models for medical image segmentation.
052	Personalize SAM	Personalize Segment Anything with 1 Shot in 10 Seconds	Hugging Face	Code	MMLab, CUHK	A training-free Personalization approach for SAM, termed as PerSAM. Given only a single image with a reference mask, PerSAM can segment specific visual concepts.
053	Open-vocabulary-Segment-Anything	Open-vocabulary-Segment-Anything	-	Code	-	Combining OwlViT with Segment Anything - Open-vocabulary Detection and Segmentation (Text-conditioned, and Image-conditioned).
054	Labal-Anything-Pipeline	Label-Anything-Pipeline	-	Code	ZJU	Annotation anything in visual tasks just all in one-pipeline with GPT-4 and SAM.
055	Grounded-Segment-Any-Parts	Grounded Segment Anything: From Objects to Parts	Project page	Code	HKU	Expand Segment Anything Model (SAM) to support text prompt input. The text prompt could be object-level(eg, dog) and part-level(eg, dog head).
056	AnyLabeling	AnyLabeling	Youtube page	Code	-	Effortless AI-assisted data labeling with AI support from Segment Anything and YOLO.
057	SSA	Semantic-Segment-Anything	Project page	Code	-	Automated dense category annotation engine that serves as the initial semantic labeling for the Segment Anything dataset (SA-1B).
058	RefSAM	Label Data with Segment Anything in Roboflow	Project page	Code	-	Referring Image Segmentation Benchmarking with Segment Anything Model (SAM).
059	Roboflow Annotate	Launch: Label Data with Segment Anything in Roboflow	Project page	APP	Roboflow	SAM-assisted labeling for training computer vision models.
060	ImageBind SAM	-	-	Code	IDEA-Research	This is an experimental demo aims to combine ImageBind and SAM to generate mask with different modalities.
061	X-AnyLabeling	X-AnyLabeling	WeChat	Code	CVHub	A new interactive automatic labeling tool based on AnyLabeling.
062	Segment Anything + NNCF	-	WeChat	Code	-	OpenVINO™ NNCF for segment anything encoder quantization acceleration.
063	YOLOv8 + SAM	-	WeChat	-	-	Use SAM in YOLOv8.
064	SearchAnything	SearchAnything	Zhihu blog, Twitter	Code	CAS and MSRA	A semantic local search engine powered by various AI models.
065	SAM Meets Stable Diffusion	-	WeChat	Code	PaddlePaddle	Segment and generate Anything.
066	Language Segment-Anything	-	-	Code	-	SAM with text prompts generates masks for specific objects in images.
067	Expedit-SAM	-	-	Code	-	Expediting SAM without Fine-tuning.
068	Segment-Anything-Fast	Accelerating Generative AI with PyTorch: Segment Anything, Fast	Project page	Code	Team PyTorch	A batched offline inference oriented version of segment-anything.

Awesome Repositories for SAM

License

This project is released under the MIT license. Please see the LICENSE file for more information.

djene-mengistu / Awesome-Segment-Anything