CVIP's repositories
APGCC
ECCV24 - Improving Point-based Crowd Counting and Localization Based on Auxiliary Point Guidance
Awesome-Foundation-Models
A curated list of foundation models for vision and language tasks
BasicPBC
Official Implementation of "Learning Inclusion Matching for Animation Paint Bucket Colorization"
E2STR
The official code for the CVPR 2024 paper: Multi-modal In-Context Learning Makes an Ego-evolving Scene Text Recognizer
EfficientTrain
1.5−3.0× lossless training or pre-training speedup. An off-the-shelf, easy-to-implement algorithm for the efficient training of foundation visual backbones.
hriq
High Resolution Image Quality (HRIQ) database and model
MDKNet
Modulating Domain-Specific Knowledge for Multi-domain Crowd Counting
mgc
The official implementation of paper: "Multi-Grained Contrast for Data-Efficient Unsupervised Representation Learning"
MLoRE
Project Page for "Multi-Task Dense Prediction via Mixture of Low-Rank Experts"
MobileAgent
Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception
MPCount
Official repo for CVPR2024 paper "Single Domain Generalization for Crowd Counting"
Official_Remote_Sensing_Mamba
Official code of Remote Sensing Mamba
PIIP
Parameter-Inverted Image Pyramid Networks (PIIP)
PromptAlign
[NeurIPS 2023] Align Your Prompts: Test-Time Prompting with Distribution Alignment for Zero-Shot Generalization
Q-Bench
①[ICLR2024 Spotlight] (GPT-4V/Gemini-Pro/Qwen-VL-Plus+16 OS MLLMs) A benchmark for multi-modality LLMs (MLLMs) on low-level vision and visual quality assessment.
Rewrite-the-Stars
[CVPR 2024] Rewrite the Stars
RWKV-CLIP
The official code of "RWKV-CLIP: A Robust Vision-Language Representation Learner"
RWKV-infctx-trainer
RWKV infctx trainer, for training arbitary context sizes, to 10k and beyond!
Shadow_R
This is the official PyTorch implementation of ShadowRefiner. Our method is winner of Perceptual Track and achieves the second-best performance for Fidelity Track in NTIRE 2024 Shadow Removal Challenge (CVPR 2024 Workshop)
StreamSpeech
StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.
TSCM
[ICRA24] TSCM: A Teacher-Student Model for Vision Place Recognition Using Cross-Metric Knowledge Distillation
Vision-RWKV
Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures
VisualRWKV
VisualRWKV is the visual-enhanced version of the RWKV language model, enabling RWKV to handle various visual tasks.
ViTamin
[CVPR 2024] Official implementation of "ViTamin: Designing Scalable Vision Models in the Vision-language Era"