starriness's starred repositories
Micro-Action
[TCSVT 2024] Official implementation of the paper: Benchmarking Micro-action Recognition: Dataset, Methods, and Applications
CVPR2023-CMPAE
[CVPR 2023] Collecting Cross-Modal Presence-Absence Evidence for Weakly-Supervised Audio-Visual Event Perception
MiGA2023_Track1
[IJCAI 2023]The Champion of Micro-gesture Classification sub-challenge in MiGA@IJCAI2023.
OGM-GE_CVPR2022
The repo for "Balanced Multimodal Learning via On-the-fly Gradient Modulation", CVPR 2022 (ORAL)
Non-local_pytorch
Implementation of Non-local Block.
cross_modal_adaptation
Cross-modal few-shot adaptation with CLIP
LM4VisualEncoding
[ICLR 2024 (Spotlight)] "Frozen Transformers in Language Models are Effective Visual Encoder Layers"
TransnormerLLM
Official implementation of TransNormerLLM: A Faster and Better LLM
up-to-date-Vision-Language-Models
Up-to-date Vision Language Models collection. Mainly focus on computer vision
Grounded-Segment-Anything
Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
UniDetector
Code release for our CVPR 2023 paper "Detecting Everything in the Open World: Towards Universal Object Detection".
EditAnything
Edit anything in images powered by segment-anything, ControlNet, StableDiffusion, etc. (ACM MM)
Awesome-Masked-Autoencoders
A collection of literature after or concurrent with Masked Autoencoder (MAE) (Kaiming He el al.).
Transnormer
[EMNLP 2022] Official implementation of Transnormer in our EMNLP 2022 paper - The Devil in Linear Transformer
awesome-self-supervised-learning
A curated list of awesome self-supervised methods
MockingBird
🚀AI拟声: 5秒内克隆您的声音并生成任意语音内容 Clone a voice in 5 seconds to generate arbitrary speech in real-time
TemporalPyramidRouting
Temporal Pyramid Routing For Video Instance Segmentation-T-PAMI-2022