Chenxin Li's repositories
Grounded-Segment-Anything
Marrying Grounding DINO with Segment Anything & Stable Diffusion & Tag2Text & BLIP & Whisper & ChatBot - Automatically Detect , Segment and Generate Anything with Image, Text, and Audio Inputs
multimodal-prompt-learning
[CVPR 2023] Official repository of paper titled "MaPLe: Multi-modal Prompt Learning".
Academic-project-page-template
A project page template for academic papers. Demo at https://eliahuhorwitz.github.io/Academic-project-page-template/
Auto-GPT
An experimental open-source attempt to make GPT-4 fully autonomous.
Awesome-Dataset-Distillation
Awesome Dataset Distillation Papers
Awesome-Text-to-Image
(ෆ`꒳´ෆ) A Survey on Text-to-Image Generation/Synthesis.
CF-ViT
Pytorch implementation of "CF-ViT: A General Coarse-to-Fine Method for Vision Transformer"
CoOp
Prompt Learning for Vision-Language Models (IJCV'22, CVPR'22)
Endo-FM
[MICCAI'23] Foundation Model for Endoscopy Video Analysis via Large-scale Self-supervised Pre-train
Endo-FM-1
[MICCAI'23] Foundation Model for Endoscopy Video Analysis via Large-scale Self-supervised Pre-train
generative-ai-roadmap
生成式AI的应用路线图 The roadmap of generative AI: use cases and applications
generative-models
Generative Models by Stability AI
Latte
Latte: Latent Diffusion Transformer for Video Generation.
LAVIS
LAVIS - A One-stop Library for Language-Vision Intelligence
LightGaussian
"LightGaussian: Unbounded 3D Gaussian Compression with 15x Reduction and 200+ FPS", Zhiwen Fan, Kevin Wang, Kairun Wen, Zehao Zhu, Dejia Xu, Zhangyang Wang
MiniGPT-4
MiniGPT-4: Enhancing Vision-language Understanding with Advanced Large Language Models
Multi-Modality-Arena
Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing images as inputs. Supports MiniGPT-4, LLaMA-Adapter V2, LLaVA, BLIP-2, and many more!
PhysGaussian
[CVPR 2024] PhysGaussian: Physics-Integrated 3D Gaussians for Generative Dynamics
SEED
Official implementation of SEED-LLaMA (ICLR 2024).
SOMA
[ICCV' 23 ORAL] Novel Scenes & Classes: Towards Adaptive Open-set Object Detection
Source-Free-Domain-Generalization
An open-world scenario domain generalization code base
VisualGLM-6B
Chinese and English multimodal conversational language model | 多模态中英双语对话语言模型