Xiaodong Wang's starred repositories
Open-LLaVA-NeXT
An open-source implementation of LLaVA-NeXT.
video2dataset
Easily create large video dataset from video urls
LLMs-from-scratch
Implementing a ChatGPT-like LLM in PyTorch from scratch, step by step
fish-speech
Brand new TTS solution
MM-Instruct
MM-Instruct: Generated Visual Instructions for Large Multimodal Model Alignment
Video-Infinity
Video-Infinity generates long videos quickly using multiple GPUs without extra training.
mPLUG-HalOwl
mPLUG-HalOwl: Multimodal Hallucination Evaluation and Mitigating
bootstrapped-preference-optimization-BPO-
code for "Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization"
elevenlabs-python
The official Python API for ElevenLabs Text to Speech.
Bark-Voice-Cloning
Bark Voice Cloning and Voice Cloning for Chinese Speech
Reinforcement-Learning-in-Robotics
This is a private learning repository for reinforcement learning techniques used in robotics.
multimodal-dit-pytorch
Implementation of a multimodal diffusion transformer in Pytorch