Xiaodong Wang (Wang-Xiaodong1899)

Wang-Xiaodong1899

Geek Repo

Company:Peking University

Home Page:https://wang-xiaodong1899.github.io/

Github PK Tool:Github PK Tool

Xiaodong Wang's starred repositories

pytubefix

A pytube fork with additional features and fixes

Language:PythonLicense:MITStargazers:126Issues:0Issues:0

Open-LLaVA-NeXT

An open-source implementation of LLaVA-NeXT.

Language:PythonStargazers:121Issues:0Issues:0

video2dataset

Easily create large video dataset from video urls

Language:PythonLicense:MITStargazers:497Issues:0Issues:0

HERO

Research code for EMNLP 2020 paper "HERO: Hierarchical Encoder for Video+Language Omni-representation Pre-training"

Language:PythonLicense:MITStargazers:227Issues:0Issues:0

NExT-QA

NExT-QA: Next Phase of Question-Answering to Explaining Temporal Actions (CVPR'21)

Language:PythonLicense:MITStargazers:111Issues:0Issues:0

POPE

[EMNLP'23] The official GitHub page for ''Evaluating Object Hallucination in Large Vision-Language Models''

Language:PythonLicense:MITStargazers:59Issues:0Issues:0

ScienceQA

Data and code for NeurIPS 2022 Paper "Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering".

Language:PythonLicense:MITStargazers:565Issues:0Issues:0

LLMs-from-scratch

Implementing a ChatGPT-like LLM in PyTorch from scratch, step by step

Language:Jupyter NotebookLicense:NOASSERTIONStargazers:21714Issues:0Issues:0

fish-speech

Brand new TTS solution

Language:PythonLicense:NOASSERTIONStargazers:4976Issues:0Issues:0

lmms-eval

Accelerating the development of large multimodal models (LMMs) with lmms-eval

Language:PythonLicense:NOASSERTIONStargazers:1050Issues:0Issues:0

MM-Instruct

MM-Instruct: Generated Visual Instructions for Large Multimodal Model Alignment

Language:PythonLicense:Apache-2.0Stargazers:23Issues:0Issues:0

Video-Infinity

Video-Infinity generates long videos quickly using multiple GPUs without extra training.

Language:PythonStargazers:110Issues:0Issues:0

Firefly

Firefly: 大模型训练工具,支持训练Qwen2、Yi1.5、Phi-3、Llama3、Gemma、MiniCPM、Yi、Deepseek、Orion、Xverse、Mixtral-8x7B、Zephyr、Mistral、Baichuan2、Llma2、Llama、Qwen、Baichuan、ChatGLM2、InternLM、Ziya2、Vicuna、Bloom等大模型

Language:PythonStargazers:5251Issues:0Issues:0

ChatTTS

A generative speech model for daily dialogue.

Language:PythonLicense:NOASSERTIONStargazers:27421Issues:0Issues:0

POVID

[Arxiv] Aligning Modalities in Vision Large Language Models via Preference Fine-tuning

Language:PythonLicense:Apache-2.0Stargazers:52Issues:0Issues:0

mPLUG-HalOwl

mPLUG-HalOwl: Multimodal Hallucination Evaluation and Mitigating

Language:PythonLicense:MITStargazers:63Issues:0Issues:0

bootstrapped-preference-optimization-BPO-

code for "Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization"

Language:PythonLicense:Apache-2.0Stargazers:28Issues:0Issues:0

RLHF-V

[CVPR'24] RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback

Language:PythonStargazers:192Issues:0Issues:0

SPIN

The official implementation of Self-Play Fine-Tuning (SPIN)

Language:PythonLicense:Apache-2.0Stargazers:891Issues:0Issues:0

cambrian

Cambrian-1 is a family of multimodal LLMs with a vision-centric design.

Language:PythonLicense:Apache-2.0Stargazers:1493Issues:0Issues:0

LOOK-M

Official implementation of "LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context Inference"

Language:PythonLicense:MITStargazers:45Issues:0Issues:0
Language:PythonStargazers:80Issues:0Issues:0

SRT

i-SRT:Aligning Large Multimodal Models for Videos by Iterative Self-Retrospective Judgement

Language:PythonStargazers:8Issues:0Issues:0

vlm-rlaif

ACL'24 Main track

Language:PythonStargazers:21Issues:0Issues:0

elevenlabs-python

The official Python API for ElevenLabs Text to Speech.

Language:PythonLicense:MITStargazers:1991Issues:0Issues:0

Bark-Voice-Cloning

Bark Voice Cloning and Voice Cloning for Chinese Speech

Language:Jupyter NotebookLicense:MITStargazers:2585Issues:0Issues:0

SeVa

Official code of paper "Self-Supervised Visual Preference Alignment" https://arxiv.org/abs/2404.10501

Language:PythonLicense:GPL-3.0Stargazers:23Issues:0Issues:0

vlm_arm

机械臂+大模型+多模态=人机协作具身智能体

Language:Jupyter NotebookStargazers:299Issues:0Issues:0

Reinforcement-Learning-in-Robotics

This is a private learning repository for reinforcement learning techniques used in robotics.

Language:HTMLLicense:MITStargazers:310Issues:0Issues:0

multimodal-dit-pytorch

Implementation of a multimodal diffusion transformer in Pytorch

License:MITStargazers:90Issues:0Issues:0