Xiaodong Wang (Wang-Xiaodong1899)

Wang-Xiaodong1899

Geek Repo

Company:Peking University

Home Page:https://wang-xiaodong1899.github.io/

Github PK Tool:Github PK Tool

Xiaodong Wang's starred repositories

OpenVoice

Instant voice cloning by MyShell.

Language:PythonLicense:MITStargazers:27198Issues:206Issues:205

trl

Train transformer language models with reinforcement learning.

Language:PythonLicense:Apache-2.0Stargazers:8731Issues:78Issues:977

waymo-open-dataset

Waymo Open Dataset

Language:PythonLicense:NOASSERTIONStargazers:2612Issues:72Issues:803

ShareGPT4Video

An official implementation of ShareGPT4Video: Improving Video Understanding and Generation with Better Captions

Bunny

A family of lightweight multimodal models.

Language:PythonLicense:Apache-2.0Stargazers:785Issues:21Issues:93

Samba

Official implementation of "Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling"

Language:PythonLicense:MITStargazers:678Issues:19Issues:10

LLaMA-VID

Official Implementation for LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models

Language:PythonLicense:Apache-2.0Stargazers:630Issues:12Issues:97

SimPO

SimPO: Simple Preference Optimization with a Reference-Free Reward

OccWorld

[ECCV 2024] 3D World Model for Autonomous Driving

Language:PythonLicense:Apache-2.0Stargazers:297Issues:8Issues:24

LLaVA-RLHF

Aligning LMMs with Factually Augmented RLHF

Language:PythonLicense:GPL-3.0Stargazers:281Issues:8Issues:30

4DGen

"4DGen: Grounded 4D Content Generation with Spatial-temporal Consistency", Yuyang Yin*, Dejia Xu*, Zhangyang Wang, Yao Zhao, Yunchao Wei

NVS_Solver

Source code of paper "NVS-Solver: Video Diffusion Model as Zero-Shot Novel View Synthesizer"

Language:PythonLicense:Apache-2.0Stargazers:191Issues:11Issues:17

OmniTokenizer

OmniTokenizer: one model and one weight for image-video joint tokenization.

Language:PythonLicense:MITStargazers:180Issues:2Issues:11

EthicalTrajectoryPlanning

An Ethical Trajectory Planning Algorithm for Autonomous Vehicles

Language:PythonLicense:LGPL-3.0Stargazers:168Issues:6Issues:6

Diffusion4D

"Diffusion4D: Fast Spatial-temporal Consistent 4D Generation via Video Diffusion Models", Hanwen Liang*, Yuyang Yin*, Dejia Xu, Hanxue Liang, Zhangyang Wang, Konstantinos N. Plataniotis, Yao Zhao, Yunchao Wei

titok-pytorch

Implementation of TiTok, proposed by Bytedance in "An Image is Worth 32 Tokens for Reconstruction and Generation"

Language:PythonLicense:MITStargazers:148Issues:7Issues:3

RLAIF-V

RLAIF-V: Aligning MLLMs through Open-Source AI Feedback for Super GPT-4V Trustworthiness

VQASynth

Compose multimodal datasets 🎹

Recap-DataComp-1B

This is the official repository of our paper "What If We Recaption Billions of Web Images with LLaMA-3 ?"

richhf-18k

RichHF-18K dataset contains rich human feedback labels we collected for our CVPR'24 paper: https://arxiv.org/pdf/2312.10240, along with the file name of the associated labeled images (no urls or images are included in this dataset).

LAW

Enhancing End-to-End Autonomous Driving with Latent World Model

License:MITStargazers:60Issues:0Issues:0

HA-DPO

Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimization

Language:PythonLicense:Apache-2.0Stargazers:45Issues:2Issues:6

Learning-Naturalistic-Driving-Environment

This repo contains the code for paper "Learning naturalistic driving environment with statistical realism"

Language:PythonLicense:NOASSERTIONStargazers:36Issues:2Issues:2

CSR

[Arxiv] Calibrated Self-Rewarding Vision Language Models

Language:PythonStargazers:20Issues:0Issues:0
Language:PythonStargazers:17Issues:0Issues:0

LLMBind

LLMBind: A Unified Modality-Task Integration Framework

Language:PythonLicense:Apache-2.0Stargazers:12Issues:0Issues:0

TI2V-Zero

Text-conditioned image-to-video generation based on diffusion models.

Language:PythonLicense:AGPL-3.0Stargazers:8Issues:0Issues:0

LLMmed

Large Language Models Streamline Automated Machine Learning for Clinical Studies

Language:PythonLicense:MITStargazers:7Issues:3Issues:0

visiPAM

Code for the paper 'Zero-shot visual reasoning through probabilistic analogical mapping'

Language:PythonStargazers:5Issues:0Issues:0