Wang-Xiaodong1899

followers

following

stars

Peking University

https://wang-xiaodong1899.github.io/

Xiaodong Wang's starred repositories

pytubefix

A pytube fork with additional features and fixes

Language:PythonMIT12600

Open-LLaVA-NeXT

An open-source implementation of LLaVA-NeXT.

Language:Python12100

video2dataset

Easily create large video dataset from video urls

Language:PythonMIT49700

HERO

Research code for EMNLP 2020 paper "HERO: Hierarchical Encoder for Video+Language Omni-representation Pre-training"

Language:PythonMIT22700

NExT-QA

NExT-QA: Next Phase of Question-Answering to Explaining Temporal Actions (CVPR'21)

Language:PythonMIT11100

POPE

[EMNLP'23] The official GitHub page for ''Evaluating Object Hallucination in Large Vision-Language Models''

Language:PythonMIT5900

ScienceQA

Data and code for NeurIPS 2022 Paper "Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering".

Language:PythonMIT56500

LLMs-from-scratch

Implementing a ChatGPT-like LLM in PyTorch from scratch, step by step

Language:Jupyter NotebookNOASSERTION2171400

fish-speech

Brand new TTS solution

Language:PythonNOASSERTION497600

lmms-eval

Accelerating the development of large multimodal models (LMMs) with lmms-eval

Language:PythonNOASSERTION105000

MM-Instruct

MM-Instruct: Generated Visual Instructions for Large Multimodal Model Alignment

Language:PythonApache-2.02300

Video-Infinity

Video-Infinity generates long videos quickly using multiple GPUs without extra training.

Language:Python11000

Firefly

Firefly: 大模型训练工具，支持训练Qwen2、Yi1.5、Phi-3、Llama3、Gemma、MiniCPM、Yi、Deepseek、Orion、Xverse、Mixtral-8x7B、Zephyr、Mistral、Baichuan2、Llma2、Llama、Qwen、Baichuan、ChatGLM2、InternLM、Ziya2、Vicuna、Bloom等大模型

Language:Python525100

ChatTTS

A generative speech model for daily dialogue.

Language:PythonNOASSERTION2742100

POVID

[Arxiv] Aligning Modalities in Vision Large Language Models via Preference Fine-tuning

Language:PythonApache-2.05200

mPLUG-HalOwl

mPLUG-HalOwl: Multimodal Hallucination Evaluation and Mitigating

Language:PythonMIT6300

bootstrapped-preference-optimization-BPO-

code for "Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization"

Language:PythonApache-2.02800

RLHF-V

[CVPR'24] RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback

Language:Python19200

SPIN

The official implementation of Self-Play Fine-Tuning (SPIN)

Language:PythonApache-2.089100

cambrian

Cambrian-1 is a family of multimodal LLMs with a vision-centric design.

Language:PythonApache-2.0149300

LOOK-M

Official implementation of "LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context Inference"

Language:PythonMIT4500

LLaVA-Hound-DPO

Language:Python8000

SRT

i-SRT:Aligning Large Multimodal Models for Videos by Iterative Self-Retrospective Judgement

Language:Python800

vlm-rlaif

ACL'24 Main track

Language:Python2100

elevenlabs-python

The official Python API for ElevenLabs Text to Speech.

Language:PythonMIT199100

Bark-Voice-Cloning

Bark Voice Cloning and Voice Cloning for Chinese Speech

Language:Jupyter NotebookMIT258500

SeVa

Official code of paper "Self-Supervised Visual Preference Alignment" https://arxiv.org/abs/2404.10501

Language:PythonGPL-3.02300

vlm_arm

机械臂+大模型+多模态=人机协作具身智能体

Language:Jupyter Notebook29900

Reinforcement-Learning-in-Robotics

This is a private learning repository for reinforcement learning techniques used in robotics.

Language:HTMLMIT31000

multimodal-dit-pytorch

Implementation of a multimodal diffusion transformer in Pytorch

MIT9000