yangmin09 (feymanpriv)

feymanpriv

Geek Repo

Company:BUPT

Location:Beijing

Github PK Tool:Github PK Tool

yangmin09's starred repositories

VAR

[GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-simple, user-friendly yet state-of-the-art* codebase for autoregressive image generation!

Language:PythonLicense:MITStargazers:3552Issues:0Issues:0

TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.

Language:C++License:Apache-2.0Stargazers:6859Issues:0Issues:0
Language:PythonLicense:Apache-2.0Stargazers:6894Issues:0Issues:0

OpenDiT

OpenDiT: An Easy, Fast and Memory-Efficient System for DiT Training and Inference

Language:PythonLicense:Apache-2.0Stargazers:1022Issues:0Issues:0

Open-Sora

Open-Sora: Democratizing Efficient Video Production for All

Language:PythonLicense:Apache-2.0Stargazers:17072Issues:0Issues:0

MoE-LLaVA

Mixture-of-Experts for Large Vision-Language Models

Language:PythonLicense:Apache-2.0Stargazers:1733Issues:0Issues:0

LLaVA-Plus-Codebase

LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skills

Language:PythonLicense:Apache-2.0Stargazers:639Issues:0Issues:0

xtuner

An efficient, flexible and full-featured toolkit for fine-tuning large models (InternLM2, Llama3, Phi3, Qwen, Mistral, ...)

Language:PythonLicense:Apache-2.0Stargazers:2807Issues:0Issues:0

Chat-UniVi

[CVPR 2024 Highlight🔥] Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding

Language:PythonLicense:Apache-2.0Stargazers:654Issues:0Issues:0
Language:PythonLicense:Apache-2.0Stargazers:664Issues:0Issues:0
Language:PythonLicense:Apache-2.0Stargazers:508Issues:0Issues:0

TimeChat

[CVPR 2024] TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding

Language:PythonLicense:BSD-3-ClauseStargazers:202Issues:0Issues:0

OneLLM

[CVPR 2024] OneLLM: One Framework to Align All Modalities with Language

Language:PythonLicense:NOASSERTIONStargazers:468Issues:0Issues:0

Qwen-VL

The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.

Language:PythonLicense:NOASSERTIONStargazers:3946Issues:0Issues:0

LLaVAR

Code/Data for the paper: "LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding"

Language:PythonLicense:Apache-2.0Stargazers:240Issues:0Issues:0

COMM

Pytorch code for paper From CLIP to DINO: Visual Encoders Shout in Multi-modal Large Language Models

License:MITStargazers:175Issues:0Issues:0

LLaVA-RLHF

Aligning LMMs with Factually Augmented RLHF

Language:PythonLicense:GPL-3.0Stargazers:247Issues:0Issues:0

imgfind

根据文本描述搜索本地图片的工具,powered by Rust + candle + CLIP

Language:RustStargazers:134Issues:0Issues:0

vsc2022

Code for the Video Similarity Challenge.

Language:PythonLicense:MITStargazers:73Issues:0Issues:0

Chinese-LLaVA

支持中英文双语视觉-文本对话的开源可商用多模态模型。

Language:PythonLicense:Apache-2.0Stargazers:335Issues:0Issues:0

Chinese-Llama-2-7b

开源社区第一个能下载、能运行的中文 LLaMA2 模型!

Language:PythonLicense:Apache-2.0Stargazers:2223Issues:0Issues:0

mae_st

Official Open Source code for "Masked Autoencoders As Spatiotemporal Learners"

Language:PythonLicense:NOASSERTIONStargazers:279Issues:0Issues:0

GRiT

GRiT: A Generative Region-to-text Transformer for Object Understanding (https://arxiv.org/abs/2212.00280)

Language:PythonLicense:MITStargazers:278Issues:0Issues:0

opencompass

OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.

Language:PythonLicense:Apache-2.0Stargazers:2793Issues:0Issues:0

video2dataset

Easily create large video dataset from video urls

Language:PythonLicense:MITStargazers:467Issues:0Issues:0
Language:PythonLicense:NOASSERTIONStargazers:688Issues:0Issues:0

Awesome-Multimodal-Large-Language-Models

:sparkles::sparkles:Latest Papers and Datasets on Multimodal Large Language Models, and Their Evaluation.

Stargazers:9548Issues:0Issues:0

Video-LLaMA

[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding

Language:PythonLicense:BSD-3-ClauseStargazers:2486Issues:0Issues:0

Baichuan-7B

A large-scale 7B pretraining language model developed by BaiChuan-Inc.

Language:PythonLicense:Apache-2.0Stargazers:5645Issues:0Issues:0

Chinese-LLaMA-Alpaca

中文LLaMA&Alpaca大语言模型+本地CPU/GPU训练部署 (Chinese LLaMA & Alpaca LLMs)

Language:PythonLicense:Apache-2.0Stargazers:17594Issues:0Issues:0