Wang-Xiaodong1899

followers

following

stars

Peking University

https://wang-xiaodong1899.github.io/

Xiaodong Wang's starred repositories

OpenVoice

Instant voice cloning by MyShell.

Language:PythonMIT27198 206 205

trl

Train transformer language models with reinforcement learning.

Language:PythonApache-2.08731 78 977

waymo-open-dataset

Waymo Open Dataset

Language:PythonNOASSERTION2612 72 803

ShareGPT4Video

An official implementation of ShareGPT4Video: Improving Video Understanding and Generation with Better Captions

Language:Python1137 29 21

Bunny

A family of lightweight multimodal models.

Language:PythonApache-2.0785 21 93

Samba

Official implementation of "Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling"

Language:PythonMIT678 19 10

LLaMA-VID

Official Implementation for LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models

Language:PythonApache-2.0630 12 97

SimPO

SimPO: Simple Preference Optimization with a Reference-Free Reward

Language:Python502 6 32

OccWorld

[ECCV 2024] 3D World Model for Autonomous Driving

Language:PythonApache-2.0297 8 24

LLaVA-RLHF

Aligning LMMs with Factually Augmented RLHF

Language:PythonGPL-3.0281 8 30

4DGen

"4DGen: Grounded 4D Content Generation with Spatial-temporal Consistency", Yuyang Yin*, Dejia Xu*, Zhangyang Wang, Yao Zhao, Yunchao Wei

Language:Python201 9 5

NVS_Solver

Source code of paper "NVS-Solver: Video Diffusion Model as Zero-Shot Novel View Synthesizer"

Language:PythonApache-2.0191 11 17

OmniTokenizer

OmniTokenizer: one model and one weight for image-video joint tokenization.

Language:PythonMIT180 2 11

EthicalTrajectoryPlanning

An Ethical Trajectory Planning Algorithm for Autonomous Vehicles

Language:PythonLGPL-3.0168 6 6

Diffusion4D

"Diffusion4D: Fast Spatial-temporal Consistent 4D Generation via Video Diffusion Models", Hanwen Liang*, Yuyang Yin*, Dejia Xu, Hanxue Liang, Zhangyang Wang, Konstantinos N. Plataniotis, Yao Zhao, Yunchao Wei

Language:Python157 8 5

titok-pytorch

Implementation of TiTok, proposed by Bytedance in "An Image is Worth 32 Tokens for Reconstruction and Generation"

Language:PythonMIT148 7 3

RLAIF-V

RLAIF-V: Aligning MLLMs through Open-Source AI Feedback for Super GPT-4V Trustworthiness

Language:Python134 2 11

VQASynth

Compose multimodal datasets 🎹

Language:Python123 5 3

Recap-DataComp-1B

This is the official repository of our paper "What If We Recaption Billions of Web Images with LLaMA-3 ?"

richhf-18k

RichHF-18K dataset contains rich human feedback labels we collected for our CVPR'24 paper: https://arxiv.org/pdf/2312.10240, along with the file name of the associated labeled images (no urls or images are included in this dataset).

VLFeedback

Language:Python66 2 7

LAW

Enhancing End-to-End Autonomous Driving with Latent World Model

MIT6000

HA-DPO

Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimization

Language:PythonApache-2.045 2 6

Learning-Naturalistic-Driving-Environment

This repo contains the code for paper "Learning naturalistic driving environment with statistical realism"

Language:PythonNOASSERTION36 2 2

CSR

[Arxiv] Calibrated Self-Rewarding Vision Language Models

Language:Python2000

MEFT

Language:Python1700

LLMBind

LLMBind: A Unified Modality-Task Integration Framework

Language:PythonApache-2.01200

TI2V-Zero

Text-conditioned image-to-video generation based on diffusion models.

Language:PythonAGPL-3.0800

LLMmed

Large Language Models Streamline Automated Machine Learning for Clinical Studies

Language:PythonMIT7 30

visiPAM

Code for the paper 'Zero-shot visual reasoning through probabilistic analogical mapping'

Language:Python500