LYMDLUT / minisora

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Mini Sora 社区

 

👋 加入我们的 微信社区

Mini Sora 开源社区定位为由社区同学自发组织的开源社区(免费不收取任何费用、不割韭菜),Mini Sora 计划探索 Sora 的实现路径和后续的发展方向:

  • 将定期举办 Sora 的圆桌和社区一起探讨可能性
  • 视频生成的现有技术路径探讨

最近更新

论文复现小组

项目页面

复现目标

  1. GPU-Friendly : 最好对GPU内存大小和GPU数量要求较低, 比如8卡A100 80G, 8卡A6000 48G, RTX4090 24G之类的算力可以训练和推理
  2. Training-Efficiency : 不需要训练太久即可有较好的效果
  3. Inference-Efficiency : 推理生成视频时, 长度和分辨率不要求过高, 如3-10s,480p都是可接受的

近期圆桌讨论

Sora夜谈之Video Diffusion 综述

知乎Notes: A Survey on Generative Diffusion Model 生成扩散模型综述

论文共读计划

论文共读发表者募集

相关工作

Diffusion Model

论文 链接
1) Guided-Diffusion: Diffusion Models Beat GANs on Image Synthesis NeurIPS 21 Paper, Github
2) Latent Diffusion: High-Resolution Image Synthesis with Latent Diffusion Models CVPR 22 Paper, Github
3) EDM: Elucidating the Design Space of Diffusion-Based Generative Models NeurIPS 22 Paper, Github
4) DDPM: Denoising Diffusion Probabilistic Models NeurIPS 20 Paper, Github
5) DDIM: Denoising Diffusion Implicit Models ICLR 21 Paper, Github
6) Score-Based Diffusion: Score-Based Generative Modeling through Stochastic Differential Equations ICLR 21 Paper, Github, Blog
7) Stable Cascade: Würstchen: An efficient architecture for large-scale text-to-image diffusion models ICLR 24 Paper, Github, Blog
8) Diffusion Models in Vision: A Survey TPAMI 23 Paper, Github

Diffusion Transformer

论文 链接
1) UViT: All are Worth Words: A ViT Backbone for Diffusion Models CVPR 23 Paper, Github, ModelScope
2) DiT: Scalable Diffusion Models with Transformers ICCV 23 Paper, Github, ModelScope
3) SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers Paper, Github, ModelScope
4) FiT: Flexible Vision Transformer for Diffusion Model Paper, Github
5) k-diffusion: Scalable High-Resolution Pixel-Space Image Synthesis with Hourglass Diffusion Transformers Paper, Github
6) OpenDiT: An Easy, Fast and Memory-Efficient System for DiT Training and Inference Github
7) Large-DiT: Large Diffusion Transformer Github

Video Generation

论文 链接
1) Animatediff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning ICLR 24 Paper, Github, ModelScope
2) I2VGen-XL: High-Quality Image-to-Video Synthesis via Cascaded Diffusion Models Paper, Github, ModelScope
3) Imagen Video: High Definition Video Generation with Diffusion Models Paper
4) MoCoGAN: Decomposing Motion and Content for Video Generation CVPR 18 Paper
5) Adversarial Video Generation on Complex Datasets Paper
6) W.A.L.T: Photorealistic Video Generation with Diffusion Models Paper Project
7) VideoGPT: Video Generation using VQ-VAE and Transformers Paper, Github
8) Video Diffusion Models Paper, Github, Project
9) MCVD: Masked Conditional Video Diffusion for Prediction, Generation, and Interpolation NeurIPS 22 Paper, Github, Project, Blog
10) VideoPoet: A Large Language Model for Zero-Shot Video Generation Paper
11) MAGVIT: Masked Generative Video Transformer CVPR 23 Paper, Github, Project, Colab
12) EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions Paper, Github, Project
13) SimDA: Simple Diffusion Adapter for Efficient Video Generation Paper, Github, Project
14) StableVideo: Text-driven Consistency-aware Diffusion Video Editing ICCV 23 Paper, Github, Project
15) SVD: Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets Paper, Github
16) ADD: Adversarial Diffusion Distillation Paper, Github

Long-context

论文 链接
1) World Model on Million-Length Video And Language With RingAttention Paper, Github
2) Ring Attention with Blockwise Transformers for Near-Infinite Context Paper, Github
3) Extending LLMs' Context Window with 100 Samples Paper, Github
4) Efficient Streaming Language Models with Attention Sinks ICLR 24 Paper, Github
5) The What, Why, and How of Context Length Extension Techniques in Large Language Models – A Detailed Survey Paper
6) MovieChat: From Dense Token to Sparse Memory for Long Video Understanding CVPR 24 Paper, Github, Project

Baseline Video Generation Models

论文 链接
1) ViViT: A Video Vision Transformer ICCV 21 Paper, Github
2) VideoLDM: Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models CVPR 23 Paper
3) LVDM: Latent Video Diffusion Models for High-Fidelity Long Video Generation Paper, Github
4) LFDM: Conditional Image-to-Video Generation with Latent Flow Diffusion Models CVPR 23 Paper, Github
5) MotionDirector: Motion Customization of Text-to-Video Diffusion Models Paper, Github

Audio Related Resource

论文 链接
1) Stable Audio: Fast Timing-Conditioned Latent Audio Diffusion Link
2) MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation CVPR 23 Paper, Github
3) Pengi: An Audio Language Model for Audio Tasks NeurIPS 23 Paper, Github
4) Vast: A vision-audio-subtitle-text omni-modality foundation model and dataset NeurlPS 23 Paper, Github

Consistency

论文 链接
1) Layered Neural Atlases for Consistent Video Editing TOG 21 Paper, Github, Project,
2) StableVideo: Text-driven Consistency-aware Diffusion Video Editing ICCV 23 Paper, Github, Project
3) CoDeF: Content Deformation Fields for Temporally Consistent Video Processing Paper, Github, Project

Prompt Engineering

论文 链接
1) RealCompo: Dynamic Equilibrium between Realism and Compositionality Improves Text-to-Image Diffusion Models Paper, Github, Project
2) Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs Paper, Github
3) LLM-grounded Diffusion: Enhancing Prompt Understanding of Text-to-Image Diffusion Models with Large Language Models TMLR 23 Paper, Github
4) LLM BLUEPRINT: ENABLING TEXT-TO-IMAGE GEN-ERATION WITH COMPLEX AND DETAILED PROMPTS ICLR 24 Paper, Github
5) Progressive Text-to-Image Diffusion with Soft Latent Direction Paper
6) Self-correcting LLM-controlled Diffusion Models CVPR 24 Paper, Github
7) LayoutLLM-T2I: Eliciting Layout Guidance from LLM for Text-to-Image Generation MM 23 Paper
8) LayoutGPT: Compositional Visual Planning and Generation with Large Language Models NeurIPS 23 Paper, Github
9) Gen4Gen: Generative Data Pipeline for Generative Multi-Concept Composition Paper, Github
10) InstructEdit: Improving Automatic Masks for Diffusion-based Image Editing With User Instructions Paper, Github
11) Controllable Text-to-Image Generation with GPT-4 Paper
12) LLM-grounded Video Diffusion Models ICLR 24 Paper
13) VideoDirectorGPT: Consistent Multi-scene Video Generation via LLM-Guided Planning Paper
14) FlowZero: Zero-Shot Text-to-Video Synthesis with LLM-Driven Dynamic Scene Syntax Paper
15) VideoDrafter: Content-Consistent Multi-Scene Video Generation with LLM Paper
16) VideoDirectorGPT: Consistent Multi-scene Video Generation via LLM-Guided Planning Paper
17) Free-Bloom: Zero-Shot Text-to-Video Generator with LLM Director and LDM Animator NeurIPS 23 Paper
18) Empowering Dynamics-aware Text-to-Video Diffusion with Large Language Models Paper
19) MotionZero: Exploiting Motion Priors for Zero-shot Text-to-Video Generation Paper
20) GPT4Motion: Scripting Physical Motions in Text-to-Video Generation via Blender-Oriented GPT Planning Paper

Security

论文 链接

World Model

论文 链接

Dataset

数据集名称 链接
1) Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers Paper, Github, Project,
2) UCF101: Action Recognition Data Set Paper, Project,

现有高质量资料

资料 链接
1) Datawhale - AI视频生成学习 Feishu doc
2) A Survey on Generative Diffusion Model TKDE 24 Paper, Github
3) Awesome-Video-Diffusion-Models: A Survey on Video Diffusion Models Paper, Github
4) Awesome-Text-To-Video:A Survey on Text-to-Video Generation/Synthesis Github
5) video-generation-survey: A reading list of video generation Github
6) Awesome-Video-Diffusion Github
7) Video Generation Task in Papers With Code Link
8) Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models Paper, Github, 中文翻译
9) Open-Sora-Plan (PKU-YuanGroup) Github
10) State of the Art on Diffusion Models for Visual Computing Paper
11) Diffusion Models: A Comprehensive Survey of Methods and Applications CSUR 24 Paper, Github
12) Generate Impressive Videos with Text Instructions: A Review of OpenAI Sora, Stable Diffusion, Lumiere and Comparable Paper

Mini Sora 微信社区社区交流群

 

Star History

Star History Chart

如何向Mini Sora 社区贡献

我们非常希望你们能够为 Mini Sora 开源社区做出贡献,并且帮助我们把它做得比现在更好!

具体查看贡献指南

社区贡献者

About


Languages

Language:Python 97.8%Language:Jupyter Notebook 1.0%Language:Shell 1.0%Language:CSS 0.2%