tin2tin / Pallaidium

Generative AI for the Blender VSE: Text, video or image to video, image and audio in Blender Video Sequence Editor.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

PALLAIDIUM - Generative AI for the Blender VSE

AI-generate video, image, and audio from text prompts or video, image, or text strips.

PallAIdium

Discord

https://discord.gg/csBJhBtE

Features

Text to video Text to audio
Text to speech Text to image
Image to image Image to video
Video to video Image to text
ControlNet OpenPose
Canny Illusion
Multiple LoRAs Segmind distilled SDXL
Seed Quality steps
Frames Word power
Style selector Strip power
Batch conversion Batch refinement of images.
Batch upscale & refinement of movies. Model card selector.
Render-to-path selector. Render finished notification.
Model Cards One-click install and uninstall dependencies.
User-defined file path for generated files. Define the location for storing generated files.
Seed and prompt added to strip name. Include seed and prompt information in the strip name.

image

Requirements

  • Windows (Unsupported: Linux and MacOS).
  • A CUDA-supported Nvidia card with at least 6 GB VRAM.
  • CUDA: 12.4
  • 20+ GB HDD. (Each model is 6+ GB).

For Mac and Linux, we'll have to rely on contributor support. So, post your issues here for Mac: #106 and here for Linux: #105, and hope some contributor wants to help you out.

How to install

  • First you must download and install git for your platform(must be on PATH(or Bark will fail)): https://git-scm.com/downloads

  • Download the add-on: https://github.com/tin2tin/text_to_video/archive/refs/heads/main.zip

  • On Windows, right-click on the Blender icon and "Run Blender as Administrator"(or you'll get write permission errors).

  • Install the add-on as usual: Preferences > Add-ons > Install > select file > enable the add-on.

  • In the Generative AI add-on preferences, hit the "Install Dependencies" button.

  • Note that you can change what model cards are used in the various modes here(video, image, and audio).

  • Then it writes that it is finished(if any vital errors, let me know).

  • Restart Blender.

  • Open the add-on UI in the Sequencer > Sidebar > Generative AI.

  • The first time any model is executed many GB will have to be downloaded, so go grab lots of coffee.

  • If it says: "ModuleNotFoundError: Refer to https://github.com/facebookresearch/xformers for more information on how to install xformers", then try to restart Blender.

Tip
If any Python modules are missing, use this add-on to manually install them:
https://github.com/amb/blender_pip

Change Log

  • 2024-4-29: Add: PixArt Sigma 2k, PixArt 1024 and RealViz V4
  • 2024-2-23: Add: Proteus Lightning and Dreamshaper XL Lightning
  • 2024-2-21: Add: SDXL-Lightning 2 Step & Proteus v. 0.3
  • 2024-1-02: Add: WhisperSpeech
  • 2024-01-01: Fix installation and Bark bugs.
  • 2024-01-31: Add OpenDalle, Speed option, SDXL and LoRA support for Canny and OpenPose, include OpenPose rig images. Prune old models including SD.
  • 2023-12-18: Add: Bark audio enhance, Segmind Vega.
  • 2023-12-1: Add SD Turbo & MusicGen Medium, MPS device for MacOS.
  • 2023-11-30: Add: SVD, SVD-XT, SDXL Turbo

Location

Install Dependencies, set Movie Model Card, and set Sound Notification in the add-on preferences:

image

Video Sequence Editor > Sidebar > Generative AI:

image

Styles:

image

See SDXL handling most of the styles here: https://stable-diffusion-art.com/sdxl-styles/

Updates

Read about the updates here:

https://github.com/tin2tin/Pallaidium/discussions/categories/announcements

Text to Video

The Animov models have been trained on Anime material, so adding "anime" to the prompt is necessary, especially for the Animov-512x model.

Text to Image

The Stable Diffusion models for generating images have been used a lot, so there are plenty of prompt suggestions out there if you google for them.

Artists

https://stablediffusion.fr/artists

Prompting:

https://github.com/invoke-ai/InvokeAI/blob/main/docs/features/PROMPTS.md

https://stablediffusion.fr/prompts

https://blog.segmind.com/generating-photographic-images-with-stable-diffusion/

Tip
If the image of your renders breaks, then use the resolution from the Model Card in the Preferences.
Tip
If the image of your playback stutters, then select a strip > Menu > Strip > Movie Strip > Set Render Size.
Tip
If you get the message that CUDA is out of memory, then restart Blender to free up memory and make it stable again.

Batch Processing

Select multiple strips and hit Generate. When doing this, the file name, and if found the seed value, are automatically inserted into the prompt and seed value. However, in the add-on preferences, this behavior can be switched off.

ai_batch_ex2_0000-0574.mp4

Text to Audio

Bark

Find Bark documentation here: https://github.com/suno-ai/bark

  • [laughter]
  • [laughs]
  • [sighs]
  • [music]
  • [gasps]
  • [clears throat]
  • — or ... for hesitations
  • ♪ for song lyrics
  • capitalization for emphasis on a word
  • MAN/WOMAN: for bias towards the speaker

Speaker Library: https://suno-ai.notion.site/8b8e8749ed514b0cbf3f699013548683?v=bc67cff786b04b50b3ceb756fd05f68c

Tip
If the audio breaks up, try processing longer sentences.

AudioLDM2

Find AudioLDM documentation here: https://github.com/haoheliu/AudioLDM Try prompts like: Bag pipes playing a funeral dirge, punk rock band playing hardcore song, techno dj playing deep bass house music, and acid house loop with jazz. Or: Voice of God judging mankind, woman talking about celestial beings, hammer on wood.

Performance

The performance can be improved by following this guide: https://nvidia.custhelp.com/app/answers/detail/a_id/5490/~/system-memory-fallback-for-stable-diffusion

New to Blender?

Watch this tutorial: https://youtu.be/4_MIaxzjh5Y?feature=shared

AI Modules

Diffusers: https://github.com/huggingface/diffusers

Zeroscope XL: https://huggingface.co/cerspense/zeroscope_v2_XL

AudioLDM2 Music: https://huggingface.co/cvssp/audioldm-s-full-v2 https://github.com/haoheliu/AudioLDM

MusicGen Stereo: https://huggingface.co/facebook/musicgen-stereo-medium

Bark: https://github.com/suno-ai/bark

Stable Diffusion XL: https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0

Segmind SDXL: https://huggingface.co/segmind/SSD-1B https://blog.segmind.com/generating-photographic-images-with-stable-diffusion/

OpenDalle: https://huggingface.co/dataautogpt3/OpenDalleV1.1

Uninstall

Hugging Face Diffusers models are downloaded from the hub and saved to a local cache directory. By default, the cache directory is located at:

On Linux and macOS: ~/.cache/huggingface/hub

On Windows: %userprofile%\.cache\huggingface\hub

Here you can locate and delete the individual models.

Useful add-ons

Add Rendered Strips

Since the Generative AI add-on only can input image or movie strips, you'll need to convert other strip types to movie-strip. For this purpose, this add-on can be used:

https://github.com/tin2tin/Add_Rendered_Strips

image

VSE Masking Tools

For creating a mask on top of a clip in the Sequencer, this add-on can be used to input the clip as background in the Blender Image Editor. The created mask can then be added to the VSE as a strip, and converted to video with the above add-on:

https://github.com/tin2tin/vse_masking_tools

image

Subtitle Editor

Edit and navigate in the generated text strips.

https://github.com/tin2tin/Subtitle_Editor

Screenwriter Assistant

Get chatGPT to generate stories, which can be used as prompts.

https://github.com/tin2tin/Blender_Screenwriter_Assistant_chat_GPT

Text to Strip

Convert text from the Text Editor to strips which can be used as prompts for batch generation.

https://github.com/tin2tin/text_to_strip

Useful Projects

Trainer for LoRAs: https://github.com/johnman3032/simple-lora-dreambooth-trainer

HD Horizon(LoRA for making SD 1.5 work at higher resolutions): https://civitai.com/models/238891/hd-horizon-the-resolution-frontier-multi-resolution-high-resolution-native-inferencing

Video Examples

Image to Text

derush20001-0571.mp4

Illusion Diffusion

Illusion_silent_0001-0366.mp4

Scribble

scribble_0001-0156.mp4

Styles

TEXTs_010000-0495.mp4

Canny

Controlnet_final_0001-0603.mp4

OpenPose

OpenPose10000-0320.mp4

Zeroscope

Watch the video

Würstchen

Watch the video

Bark

Watch the video

Batch from Text Strips

Watch the video

Video to video:

bagel.mp4
Burger4.mp4

Img2img:

3160-3714.mp4

Painting

Hammershoi.mp4

Enhancement Info

LCM

https://huggingface.co/blog/lcm_lora

FreeU

https://github.com/ChenyangSi/FreeU

Restrictions for using the AI models:

  • The models can only be used for non-commercial purposes. The models are meant for research purposes.
  • The models was not trained to realistically represent people or events, so using it to generate such content is beyond the model's capabilities.
  • It is prohibited to generate content that is demeaning or harmful to people or their environment, culture, religion, etc.
  • Prohibited for pornographic, violent, and bloody content generation.
  • Prohibited for error and false information generation.

About

Generative AI for the Blender VSE: Text, video or image to video, image and audio in Blender Video Sequence Editor.

License:GNU General Public License v3.0


Languages

Language:Python 100.0%