nemonameless

followers

following

stars

Baidu

Beijing

nifeng's repositories

PaddleDetection

Object detection and instance segmentation toolkit based on PaddlePaddle.

Language:PythonApache-2.03 10

PaddleYOLO

🚀🚀🚀 YOLOSeries of PaddleDetection implementation, PPYOLOE, YOLOX, YOLOv5, YOLOv6, YOLOv7 and so on. 🚀🚀🚀

Language:PythonGPL-3.02 10

AniPortrait

AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation

Apache-2.0000

Awesome-Multimodal-Large-Language-Models

:sparkles::sparkles:Latest Papers and Datasets on Multimodal Large Language Models, and Their Evaluation.

000

Bunny

A family of lightweight multimodal models.

Language:PythonApache-2.0000

cobra

Cobra: Extending Mamba to Multi-modal Large Language Model for Efficient Inference

Language:PythonMIT000

diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch

Language:PythonApache-2.0010

DiS

Scalable Diffusion Models with State Space Backbone

NOASSERTION000

Emu

Emu: An Open Multimodal Generalist

Language:Python000

InternVL

InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks —— An Open-Source Alternative to ViT-22B

MIT000

LLaMA2-Accessory

An Open-source Toolkit for LLM Development

Language:PythonNOASSERTION000

LLaVA

Visual Instruction Tuning: Large Language-and-Vision Assistant built towards multimodal GPT-4 level capabilities.

Language:PythonApache-2.0000

MetaTransformer

Meta-Transformer for Unified Multimodal Learning

Language:PythonApache-2.0000

OmDet

Fast and accurate open-vocabulary end-to-end object detection

Apache-2.0000

Open-Sora-Plan

This project aim to reproduce Sora (Open AI T2V model), but we only have limited resource. We deeply wish the all open source community can contribute to this project.

Language:PythonMIT000

OpenDiT

OpenDiT: An Easy, Fast and Memory-Efficient System for DiT Training and Inference

Language:PythonApache-2.0000

Paddle

PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice （『飞桨』核心框架，深度学习&机器学习高性能单机、分布式训练和跨平台部署）

Language:C++Apache-2.0010

PaddleClas

A treasure chest for visual recognition powered by PaddlePaddle

Language:PythonApache-2.0000

PaddleMIX

Language:PythonApache-2.0000

PixArt-sigma

New PixArt Model, Faster, Stronger, Better

AGPL-3.0000

Qwen-VL

The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.

Language:PythonNOASSERTION000

StreamingT2V

StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text

Language:Python000

transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Language:PythonApache-2.0000

VAR

[GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction"

MIT000

VisIT-Bench

Language:Python000

vit-pytorch

Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch

Language:PythonMIT000

VMamba

VMamba: Visual State Space Models

Language:Python000

xtuner

An efficient, flexible and full-featured toolkit for fine-tuning large models (InternLM, Llama, Baichuan, Qwen, ChatGLM)

Apache-2.0000

YOLO-World

[CVPR 2024] Real-Time Open-Vocabulary Object Detection

GPL-3.0000

zigma

The official implementation of "ZigMa: A DiT-Style Mamba-based Diffusion Model

Language:PythonApache-2.0000