ZCMax

ChaimZhu's starred repositories

GPT-SoVITS

1 min voice data can also be used to train a good TTS model! (few shot voice cloning)

Language:PythonMIT25359 170 806

CLIP

CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image

Language:Jupyter NotebookMIT22691 312 382

llama3

The official Meta Llama 3 GitHub site

Language:PythonNOASSERTION20824 164 149

Open-Sora

Open-Sora: Democratizing Efficient Video Production for All

Language:PythonApache-2.017075 155 262

rerun

Visualize streams of multimodal data. Fast, easy to use, and simple to integrate. Built in Rust using egui.

Language:RustApache-2.05345 57 2522

MGM

Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"

Language:PythonApache-2.03018 25 114

Awesome-LLMs-for-Video-Understanding

🔥🔥🔥Latest Papers, Codes and Datasets on Vid-LLMs.

739 200

Awesome-LLM-3D

Awesome-LLM-3D: a curated list of Multi-modal Large Language Model in 3D world Resources

MIT694 27 3

LLaVA-pp

🔥🔥 LLaVA++: Extending LLaVA with Phi-3 and LLaMA-3 (LLaVA LLaMA-3, LLaVA Phi-3)

Language:Python666 10 27

Chat-UniVi

[CVPR 2024 Highlight🔥] Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding

Language:PythonApache-2.0654 7 32

LLaVA-Plus-Codebase

LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skills

Language:PythonApache-2.0639 10 23

VLMEvalKit

Open-source evaluation toolkit of large vision-language models (LVLMs), support GPT-4v, Gemini, QwenVLPlus, 40+ HF models, 20+ benchmarks

Language:PythonApache-2.0479 8 67

Mask3D

Mask3D predicts accurate 3D semantic instances achieving state-of-the-art on ScanNet, ScanNet200, S3DIS and STPLS3D.

Language:PythonMIT476 9 158

vstar

PyTorch Implementation of "V* : Guided Visual Search as a Core Mechanism in Multimodal LLMs"

Language:PythonMIT443 10 13

qingwu-zimu

青梧字幕是一款基于whisper的AI字幕提取工具

Language:C++MIT371 4 2

omnidata

A Scalable Pipeline for Making Steerable Multi-Task Mid-Level Vision Datasets from 3D Scans [ICCV 2021]

Language:Jupyter NotebookNOASSERTION364 10 58

PLLaVA

Official repository for the paper PLLaVA

Language:Python349 10 35

Stratified-Transformer

Stratified Transformer for 3D Point Cloud Segmentation (CVPR 2022)

Language:PythonMIT346 6 97

Sora AI Awesome List – Your go-to resource hub for all things Sora AI, OpenAI's groundbreaking model for crafting realistic scenes from text. Explore a curated collection of articles, videos, podcasts, and news about Sora's capabilities, advancements, and more.

Apache-2.0196 50

ZCMax

ChaimZhu's starred repositories

GPT-SoVITS

CLIP

llama3

Open-Sora

ml-ferret

rerun

MGM

Awesome-LLMs-for-Video-Understanding

Awesome-LLM-3D

LLaVA-pp

Chat-UniVi

LLaVA-Plus-Codebase

VLMEvalKit

Mask3D

vstar

qingwu-zimu

omnidata

PLLaVA

Stratified-Transformer

Awesome-Open-AI-Sora

probe3d

NExT-Chat

3D-VLA

open-eqa

multi_token

SceneVerse

VQASynth

act3d-chained-diffuser

Online3D

ScanRefer_Browser