feymanpriv

followers

following

stars

BUPT

Beijing

yangmin09's starred repositories

VAR

[GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-simple, user-friendly yet state-of-the-art* codebase for autoregressive image generation!

Language:PythonMIT355200

TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.

Language:C++Apache-2.0685900

LWM

Language:PythonApache-2.0689400

OpenDiT

OpenDiT: An Easy, Fast and Memory-Efficient System for DiT Training and Inference

Language:PythonApache-2.0102200

Open-Sora

Open-Sora: Democratizing Efficient Video Production for All

Language:PythonApache-2.01707200

MoE-LLaVA

Mixture-of-Experts for Large Vision-Language Models

Language:PythonApache-2.0173300

LLaVA-Plus-Codebase

LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skills

Language:PythonApache-2.063900

xtuner

An efficient, flexible and full-featured toolkit for fine-tuning large models (InternLM2, Llama3, Phi3, Qwen, Mistral, ...)

Language:PythonApache-2.0280700

Chat-UniVi

[CVPR 2024 Highlight🔥] Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding

Language:PythonApache-2.065400

BakLLaVA

Language:PythonApache-2.066400

unified-io-2

Language:PythonApache-2.050800

TimeChat

[CVPR 2024] TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding

Language:PythonBSD-3-Clause20200

OneLLM

[CVPR 2024] OneLLM: One Framework to Align All Modalities with Language

Language:PythonNOASSERTION46800

Qwen-VL

The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.

Language:PythonNOASSERTION394600

LLaVAR

Code/Data for the paper: "LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding"

Language:PythonApache-2.024000

COMM

Pytorch code for paper From CLIP to DINO: Visual Encoders Shout in Multi-modal Large Language Models

MIT17500

LLaVA-RLHF

Aligning LMMs with Factually Augmented RLHF

Language:PythonGPL-3.024700

imgfind

根据文本描述搜索本地图片的工具，powered by Rust + candle + CLIP

Language:Rust13400

vsc2022

Code for the Video Similarity Challenge.

Language:PythonMIT7300

Chinese-LLaVA

支持中英文双语视觉-文本对话的开源可商用多模态模型。

Language:PythonApache-2.033500

Chinese-Llama-2-7b

开源社区第一个能下载、能运行的中文 LLaMA2 模型！

Language:PythonApache-2.0222300

mae_st

Official Open Source code for "Masked Autoencoders As Spatiotemporal Learners"

Language:PythonNOASSERTION27900

GRiT

GRiT: A Generative Region-to-text Transformer for Object Understanding (https://arxiv.org/abs/2212.00280)

Language:PythonMIT27800

opencompass

OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.

Language:PythonApache-2.0279300

video2dataset

Easily create large video dataset from video urls

Language:PythonMIT46700

shikra

Language:PythonNOASSERTION68800

Awesome-Multimodal-Large-Language-Models

:sparkles::sparkles:Latest Papers and Datasets on Multimodal Large Language Models, and Their Evaluation.

Video-LLaMA

[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding

Language:PythonBSD-3-Clause248600

Baichuan-7B

A large-scale 7B pretraining language model developed by BaiChuan-Inc.

Language:PythonApache-2.0564500

Chinese-LLaMA-Alpaca

中文LLaMA&Alpaca大语言模型+本地CPU/GPU训练部署 (Chinese LLaMA & Alpaca LLMs)

Language:PythonApache-2.01759400