yuanpengtu

Penalty_kl's starred repositories

Open-Sora-Plan

This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.

Language:PythonApache-2.01088300

EMO

Emote Portrait Alive: Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions

717800

Latte

Latte: Latent Diffusion Transformer for Video Generation.

Language:PythonApache-2.0144700

Vim

[ICML 2024] Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model

Language:PythonApache-2.0256600

DiS

Scalable Diffusion Models with State Space Backbone

Language:PythonNOASSERTION14200

Neural-Network-Parameter-Diffusion

We introduce a novel approach for parameter generation, named neural network parameter diffusion (p-diff), which employs a standard latent diffusion model to synthesize a new set of parameters

Language:Python78800

PixArt-alpha

PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis

Language:PythonAGPL-3.0252600

diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.

Language:PythonApache-2.02395200

gemma_pytorch

The official PyTorch implementation of Google's Gemma models

Language:PythonApache-2.0515100

DiffiT

[ECCV 2024] Official Repository for DiffiT: Diffusion Vision Transformers for Image Generation

38000

FiT

[ICML 2024 Spotlight] FiT: Flexible Vision Transformer for Diffusion Model

Apache-2.034100

OpenSORA

A public repository for reproducing a open source sora comparable video generation model

900

DiT

Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"

Language:PythonNOASSERTION562900

Cleaned-Webvid

Use strategy to achieve clean webvid-10m dataset

Language:Python300

mergekit

Tools for merging pretrained large language models.

Language:PythonLGPL-3.0404200

lumiere-pytorch

Implementation of Lumiere, SOTA text-to-video generation from Google Deepmind, in Pytorch

Language:PythonMIT22700

VIRL

Code for V-IRL: Grounding Virtual Intelligence in Real Life

Language:Python28900

Video-Motion-Customization

VMC: Video Motion Customization using Temporal Attention Adaption for Text-to-Video Diffusion Models (CVPR 2024)

Language:PythonApache-2.014200

4DGen

"4DGen: Grounded 4D Content Generation with Spatial-temporal Consistency", Yuyang Yin*, Dejia Xu*, Zhangyang Wang, Yao Zhao, Yunchao Wei

Language:Python20100

MetaGPT

🌟 The Multi-Agent Framework: First AI Software Company, Towards Natural Language Programming

Language:PythonMIT4145900

mamba

Mamba SSM architecture

Language:PythonApache-2.01152800

custom-diffusion

Custom Diffusion: Multi-Concept Customization of Text-to-Image Diffusion (CVPR 2023)

Language:PythonNOASSERTION181000

llmblueprint

[ICLR 2024] Official code for the paper "LLM Blueprint: Enabling Text-to-Image Generation with Complex and Detailed Prompts"

Language:Jupyter Notebook6000

Awesome-4D-Generation

An organized list of academic papers focused on the topic of 4D Generation. If you have any additions or suggestions, feel free to contribute.

4800

4dfy

4D-fy: Text-to-4D Generation Using Hybrid Score Distillation Sampling

Language:PythonApache-2.029200

Official Pytorch implementation of "Learnable Gated Temporal Shift Module for Deep Video Inpainting. Chang et al. BMVC 2019." and the FVI dataset in "Free-form Video Inpainting with 3D Gated Convolution and Temporal PatchGAN, Chang et al. ICCV 2019"

Language:Python33100

Depth-Anything

[CVPR 2024] Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data. Foundation Model for Monocular Depth Estimation

Language:PythonApache-2.0638000

ProPainter

[ICCV 2023] ProPainter: Improving Propagation and Transformer for Video Inpainting

Language:PythonNOASSERTION490700