nifeng's repositories
PaddleDetection
Object detection and instance segmentation toolkit based on PaddlePaddle.
PaddleYOLO
🚀🚀🚀 YOLOSeries of PaddleDetection implementation, PPYOLOE, YOLOX, YOLOv5, YOLOv6, YOLOv7 and so on. 🚀🚀🚀
AniPortrait
AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation
Awesome-Multimodal-Large-Language-Models
:sparkles::sparkles:Latest Papers and Datasets on Multimodal Large Language Models, and Their Evaluation.
Bunny
A family of lightweight multimodal models.
cobra
Cobra: Extending Mamba to Multi-modal Large Language Model for Efficient Inference
DiS
Scalable Diffusion Models with State Space Backbone
Emu
Emu: An Open Multimodal Generalist
InternVL
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks —— An Open-Source Alternative to ViT-22B
LLaMA2-Accessory
An Open-source Toolkit for LLM Development
LLaVA
Visual Instruction Tuning: Large Language-and-Vision Assistant built towards multimodal GPT-4 level capabilities.
MetaTransformer
Meta-Transformer for Unified Multimodal Learning
OmDet
Fast and accurate open-vocabulary end-to-end object detection
Open-Sora-Plan
This project aim to reproduce Sora (Open AI T2V model), but we only have limited resource. We deeply wish the all open source community can contribute to this project.
OpenDiT
OpenDiT: An Easy, Fast and Memory-Efficient System for DiT Training and Inference
PaddleClas
A treasure chest for visual recognition powered by PaddlePaddle
PixArt-sigma
New PixArt Model, Faster, Stronger, Better
Qwen-VL
The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.
StreamingT2V
StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text
transformers
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
VAR
[GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction"
vit-pytorch
Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch
VMamba
VMamba: Visual State Space Models
xtuner
An efficient, flexible and full-featured toolkit for fine-tuning large models (InternLM, Llama, Baichuan, Qwen, ChatGLM)
YOLO-World
[CVPR 2024] Real-Time Open-Vocabulary Object Detection
zigma
The official implementation of "ZigMa: A DiT-Style Mamba-based Diffusion Model