CVIP

CVIP's repositories

APGCC

ECCV24 - Improving Point-based Crowd Counting and Localization Based on Auxiliary Point Guidance

MIT000

Awesome-Foundation-Models

A curated list of foundation models for vision and language tasks

000

BasicPBC

Official Implementation of "Learning Inclusion Matching for Animation Paint Bucket Colorization"

NOASSERTION000

E2STR

The official code for the CVPR 2024 paper: Multi-modal In-Context Learning Makes an Ego-evolving Scene Text Recognizer

Apache-2.0000

EfficientTrain

1.5−3.0× lossless training or pre-training speedup. An off-the-shelf, easy-to-implement algorithm for the efficient training of foundation visual backbones.

MIT000

hriq

High Resolution Image Quality (HRIQ) database and model

000

MDKNet

Modulating Domain-Specific Knowledge for Multi-domain Crowd Counting

000

mgc

The official implementation of paper: "Multi-Grained Contrast for Data-Efficient Unsupervised Representation Learning"

NOASSERTION000

MLoRE

Project Page for "Multi-Task Dense Prediction via Mixture of Low-Rank Experts"

000

MobileAgent

Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception

MIT000

MPCount

Official repo for CVPR2024 paper "Single Domain Generalization for Crowd Counting"

Apache-2.0000

Official_Remote_Sensing_Mamba

Official code of Remote Sensing Mamba

000

PIIP

Parameter-Inverted Image Pyramid Networks (PIIP)

MIT000

PromptAlign

[NeurIPS 2023] Align Your Prompts: Test-Time Prompting with Distribution Alignment for Zero-Shot Generalization

000

Q-Bench

①[ICLR2024 Spotlight] (GPT-4V/Gemini-Pro/Qwen-VL-Plus+16 OS MLLMs) A benchmark for multi-modality LLMs (MLLMs) on low-level vision and visual quality assessment.

000

Rewrite-the-Stars

[CVPR 2024] Rewrite the Stars

Apache-2.0000

RWKV-CLIP

The official code of "RWKV-CLIP: A Robust Vision-Language Representation Learner"

MIT000

RWKV-infctx-trainer

RWKV infctx trainer, for training arbitary context sizes, to 10k and beyond!

Apache-2.0000

This is the official PyTorch implementation of ShadowRefiner. Our method is winner of Perceptual Track and achieves the second-best performance for Fidelity Track in NTIRE 2024 Shadow Removal Challenge (CVPR 2024 Workshop)

MIT000

StreamSpeech

StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.

MIT000

TBSN

GPL-3.0000

TSCM

[ICRA24] TSCM: A Teacher-Student Model for Vision Place Recognition Using Cross-Metric Knowledge Distillation

MIT000

VIPTR

Apache-2.0000

Vision-RWKV

Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures

Apache-2.0000

VisualRWKV

VisualRWKV is the visual-enhanced version of the RWKV language model, enabling RWKV to handle various visual tasks.

Apache-2.0000

ViTamin

[CVPR 2024] Official implementation of "ViTamin: Designing Scalable Vision Models in the Vision-language Era"

Apache-2.0000

CV-IP

CVIP's repositories

APGCC

Awesome-Foundation-Models

BasicPBC

category-wise-fine-tuning

conv-llava

E2STR

EfficientTrain

hriq

in-context-matting

MDKNet

mgc

MLoRE

MobileAgent

MPCount

Official_Remote_Sensing_Mamba

PIIP

PIQ2023

PromptAlign

Q-Bench

Rewrite-the-Stars

RWKV-CLIP

RWKV-infctx-trainer

Shadow_R

StreamSpeech

TBSN

TSCM

VIPTR

Vision-RWKV

VisualRWKV

ViTamin