ChaimZhu's starred repositories
GPT-SoVITS
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
Awesome-LLMs-for-Video-Understanding
🔥🔥🔥Latest Papers, Codes and Datasets on Vid-LLMs.
Awesome-LLM-3D
Awesome-LLM-3D: a curated list of Multi-modal Large Language Model in 3D world Resources
Chat-UniVi
[CVPR 2024 Highlight🔥] Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding
LLaVA-Plus-Codebase
LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skills
VLMEvalKit
Open-source evaluation toolkit of large vision-language models (LVLMs), support GPT-4v, Gemini, QwenVLPlus, 40+ HF models, 20+ benchmarks
qingwu-zimu
青梧字幕是一款基于whisper的AI字幕提取工具
Stratified-Transformer
Stratified Transformer for 3D Point Cloud Segmentation (CVPR 2022)
Awesome-Open-AI-Sora
Sora AI Awesome List – Your go-to resource hub for all things Sora AI, OpenAI's groundbreaking model for crafting realistic scenes from text. Explore a curated collection of articles, videos, podcasts, and news about Sora's capabilities, advancements, and more.
multi_token
Embed arbitrary modalities (images, audio, documents, etc) into large language models.
act3d-chained-diffuser
A unified architecture for multimodal multi-task robotic policy learning.