👋 加入我们的 微信社区
Mini Sora 开源社区定位为由社区同学自发组织的开源社区(免费不收取任何费用、不割韭菜),Mini Sora 计划探索 Sora 的实现路径和后续的发展方向:
- 将定期举办 Sora 的圆桌和社区一起探讨可能性
- 视频生成的现有技术路径探讨
- Generate Impressive Videos with Text Instructions: A Review of OpenAI Sora, Stable Diffusion, Lumiere and Comparable
- State of the Art on Diffusion Models for Visual Computing
- CSUR 24 Paper: Diffusion Models: A Comprehensive Survey of Methods and Applications
- OpenDiT: An Easy, Fast and Memory-Efficient System for DiT Training and Inference
- GPU-Friendly : 最好对GPU内存大小和GPU数量要求较低, 比如8卡A100 80G, 8卡A6000 48G, RTX4090 24G之类的算力可以训练和推理
- Training-Efficiency : 不需要训练太久即可有较好的效果
- Inference-Efficiency : 推理生成视频时, 长度和分辨率不要求过高, 如3-10s,480p都是可接受的
知乎Notes: A Survey on Generative Diffusion Model 生成扩散模型综述
-
Latte: Latte: Latent Diffusion Transformer for Video Generation
-
Stable Cascade (ICLR 24 Paper): Würstchen: An efficient architecture for large-scale text-to-image diffusion models
-
更新中...
论文 | 链接 |
1) Guided-Diffusion: Diffusion Models Beat GANs on Image Synthesis | NeurIPS 21 Paper, Github |
2) Latent Diffusion: High-Resolution Image Synthesis with Latent Diffusion Models | CVPR 22 Paper, Github |
3) EDM: Elucidating the Design Space of Diffusion-Based Generative Models | NeurIPS 22 Paper, Github |
4) DDPM: Denoising Diffusion Probabilistic Models | NeurIPS 20 Paper, Github |
5) DDIM: Denoising Diffusion Implicit Models | ICLR 21 Paper, Github |
6) Score-Based Diffusion: Score-Based Generative Modeling through Stochastic Differential Equations | ICLR 21 Paper, Github, Blog |
7) Stable Cascade: Würstchen: An efficient architecture for large-scale text-to-image diffusion models | ICLR 24 Paper, Github, Blog |
8) Diffusion Models in Vision: A Survey | TPAMI 23 Paper, Github |
论文 | 链接 |
1) UViT: All are Worth Words: A ViT Backbone for Diffusion Models | CVPR 23 Paper, Github, ModelScope |
2) DiT: Scalable Diffusion Models with Transformers | ICCV 23 Paper, Github, ModelScope |
3) SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers | Paper, Github, ModelScope |
4) FiT: Flexible Vision Transformer for Diffusion Model | Paper, Github |
5) k-diffusion: Scalable High-Resolution Pixel-Space Image Synthesis with Hourglass Diffusion Transformers | Paper, Github |
6) OpenDiT: An Easy, Fast and Memory-Efficient System for DiT Training and Inference | Github |
7) Large-DiT: Large Diffusion Transformer | Github |
论文 | 链接 |
1) Animatediff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning | ICLR 24 Paper, Github, ModelScope |
2) I2VGen-XL: High-Quality Image-to-Video Synthesis via Cascaded Diffusion Models | Paper, Github, ModelScope |
3) Imagen Video: High Definition Video Generation with Diffusion Models | Paper |
4) MoCoGAN: Decomposing Motion and Content for Video Generation | CVPR 18 Paper |
5) Adversarial Video Generation on Complex Datasets | Paper |
6) W.A.L.T: Photorealistic Video Generation with Diffusion Models | Paper Project |
7) VideoGPT: Video Generation using VQ-VAE and Transformers | Paper, Github |
8) Video Diffusion Models | Paper, Github, Project |
9) MCVD: Masked Conditional Video Diffusion for Prediction, Generation, and Interpolation | NeurIPS 22 Paper, Github, Project, Blog |
10) VideoPoet: A Large Language Model for Zero-Shot Video Generation | Paper |
11) MAGVIT: Masked Generative Video Transformer | CVPR 23 Paper, Github, Project, Colab |
12) EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions | Paper, Github, Project |
13) SimDA: Simple Diffusion Adapter for Efficient Video Generation | Paper, Github, Project |
14) StableVideo: Text-driven Consistency-aware Diffusion Video Editing | ICCV 23 Paper, Github, Project |
15) SVD: Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets | Paper, Github |
16) ADD: Adversarial Diffusion Distillation | Paper, Github |
论文 | 链接 |
1) World Model on Million-Length Video And Language With RingAttention | Paper, Github |
2) Ring Attention with Blockwise Transformers for Near-Infinite Context | Paper, Github |
3) Extending LLMs' Context Window with 100 Samples | Paper, Github |
4) Efficient Streaming Language Models with Attention Sinks | ICLR 24 Paper, Github |
5) The What, Why, and How of Context Length Extension Techniques in Large Language Models – A Detailed Survey | Paper |
6) MovieChat: From Dense Token to Sparse Memory for Long Video Understanding | CVPR 24 Paper, Github, Project |
论文 | 链接 |
1) ViViT: A Video Vision Transformer | ICCV 21 Paper, Github |
2) VideoLDM: Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models | CVPR 23 Paper |
3) LVDM: Latent Video Diffusion Models for High-Fidelity Long Video Generation | Paper, Github |
4) LFDM: Conditional Image-to-Video Generation with Latent Flow Diffusion Models | CVPR 23 Paper, Github |
5) MotionDirector: Motion Customization of Text-to-Video Diffusion Models | Paper, Github |
论文 | 链接 |
1) Stable Audio: Fast Timing-Conditioned Latent Audio Diffusion | Link |
2) MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation | CVPR 23 Paper, Github |
3) Pengi: An Audio Language Model for Audio Tasks | NeurIPS 23 Paper, Github |
4) Vast: A vision-audio-subtitle-text omni-modality foundation model and dataset | NeurlPS 23 Paper, Github |
论文 | 链接 |
1) Layered Neural Atlases for Consistent Video Editing | TOG 21 Paper, Github, Project, |
2) StableVideo: Text-driven Consistency-aware Diffusion Video Editing | ICCV 23 Paper, Github, Project |
3) CoDeF: Content Deformation Fields for Temporally Consistent Video Processing | Paper, Github, Project |
论文 | 链接 |
1) RealCompo: Dynamic Equilibrium between Realism and Compositionality Improves Text-to-Image Diffusion Models | Paper, Github, Project |
2) Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs | Paper, Github |
3) LLM-grounded Diffusion: Enhancing Prompt Understanding of Text-to-Image Diffusion Models with Large Language Models | TMLR 23 Paper, Github |
4) LLM BLUEPRINT: ENABLING TEXT-TO-IMAGE GEN-ERATION WITH COMPLEX AND DETAILED PROMPTS | ICLR 24 Paper, Github |
5) Progressive Text-to-Image Diffusion with Soft Latent Direction | Paper |
6) Self-correcting LLM-controlled Diffusion Models | CVPR 24 Paper, Github |
7) LayoutLLM-T2I: Eliciting Layout Guidance from LLM for Text-to-Image Generation | MM 23 Paper |
8) LayoutGPT: Compositional Visual Planning and Generation with Large Language Models | NeurIPS 23 Paper, Github |
9) Gen4Gen: Generative Data Pipeline for Generative Multi-Concept Composition | Paper, Github |
10) InstructEdit: Improving Automatic Masks for Diffusion-based Image Editing With User Instructions | Paper, Github |
11) Controllable Text-to-Image Generation with GPT-4 | Paper |
12) LLM-grounded Video Diffusion Models | ICLR 24 Paper |
13) VideoDirectorGPT: Consistent Multi-scene Video Generation via LLM-Guided Planning | Paper |
14) FlowZero: Zero-Shot Text-to-Video Synthesis with LLM-Driven Dynamic Scene Syntax | Paper |
15) VideoDrafter: Content-Consistent Multi-Scene Video Generation with LLM | Paper |
16) VideoDirectorGPT: Consistent Multi-scene Video Generation via LLM-Guided Planning | Paper |
17) Free-Bloom: Zero-Shot Text-to-Video Generator with LLM Director and LDM Animator | NeurIPS 23 Paper |
18) Empowering Dynamics-aware Text-to-Video Diffusion with Large Language Models | Paper |
19) MotionZero: Exploiting Motion Priors for Zero-shot Text-to-Video Generation | Paper |
20) GPT4Motion: Scripting Physical Motions in Text-to-Video Generation via Blender-Oriented GPT Planning | Paper |
论文 | 链接 |
论文 | 链接 |
数据集名称 | 链接 |
1) Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers | Paper, Github, Project, |
2) UCF101: Action Recognition Data Set | Paper, Project, |
资料 | 链接 |
1) Datawhale - AI视频生成学习 | Feishu doc |
2) A Survey on Generative Diffusion Model | TKDE 24 Paper, Github |
3) Awesome-Video-Diffusion-Models: A Survey on Video Diffusion Models | Paper, Github |
4) Awesome-Text-To-Video:A Survey on Text-to-Video Generation/Synthesis | Github |
5) video-generation-survey: A reading list of video generation | Github |
6) Awesome-Video-Diffusion | Github |
7) Video Generation Task in Papers With Code | Link |
8) Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models | Paper, Github, 中文翻译 |
9) Open-Sora-Plan (PKU-YuanGroup) | Github |
10) State of the Art on Diffusion Models for Visual Computing | Paper |
11) Diffusion Models: A Comprehensive Survey of Methods and Applications | CSUR 24 Paper, Github |
12) Generate Impressive Videos with Text Instructions: A Review of OpenAI Sora, Stable Diffusion, Lumiere and Comparable | Paper |
我们非常希望你们能够为 Mini Sora 开源社区做出贡献,并且帮助我们把它做得比现在更好!
具体查看贡献指南