ChaimZhu's starred repositories
acad-homepage.github.io
AcadHomepage: A Modern and Responsive Academic Personal Homepage
llama3-from-scratch
llama3 implementation one matrix multiplication at a time
ml-visuals
🎨 ML Visuals contains figures and templates which you can reuse and customize to improve your scientific writing.
Stratified-Transformer
Stratified Transformer for 3D Point Cloud Segmentation (CVPR 2022)
GPT-SoVITS
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
Awesome-LLMs-for-Video-Understanding
🔥🔥🔥Latest Papers, Codes and Datasets on Vid-LLMs.
LLaVA-Plus-Codebase
LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skills
multi_token
Embed arbitrary modalities (images, audio, documents, etc) into large language models.
Chat-UniVi
[CVPR 2024 Highlight🔥] Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding
VLMEvalKit
Open-source evaluation toolkit of large vision-language models (LVLMs), support GPT-4v, Gemini, QwenVLPlus, 50+ HF models, 20+ benchmarks
act3d-chained-diffuser
A unified architecture for multimodal multi-task robotic policy learning.