qinghew / StableIdentity

πŸ”₯ StableIdentity: Inserting Anybody into Anywhere at First Sight

Home Page:https://qinghew.github.io/StableIdentity/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

StableIdentity: Inserting Anybody into Anywhere at First Sight

πŸ€—[Paper]   πŸ”₯[Project Page]

News

  • [2024.03.22]: Important Note: The base model is Stable Diffusion v2-1-base-512.
  • [2024.03.01]: Release codes for StableIdentity & ModelScopeT2V (Identity-Driven Video Generation) codes!
  • [2024.03.01]: Release codes for StableIdentity & LucidDreamer (Identity-Driven 3D Generation) codes!
  • [2024.02.29]: Release codes for StableIdentity & ControlNet codes!
  • [2024.02.25]: Release training and inference codes!

Click the GIF to access the high-resolution videos.

More results can be found in our Project Page and Paper.


Getting Started

Installation

  • Requirements (Only need 9GB VRAM for training): If you want to implement StableIdentity & LucidDreamer, you need to clone this repo by: git clone https://github.com/qinghew/StableIdentity.git --recursive to download submodules in LucidDreamer/submodules/.

    conda create -n stableid python=3.8.5
    pip install -r requirements_StableIdentity.txt
  • Download pretrained models: Stable Diffusion v2-1_512, face recognition ViT.

  • Set the paths of pretrained models as default in the Line94 of train.py or command with

    --pretrained_model_name_or_path **sd2.1_path** --vit_face_recognition_model_path **face_vit_path**
  • Download the face parsing model into models/face_parsing/res/cp.

Train

  • Train for a single test image:

    CUDA_VISIBLE_DEVICES=0 accelerate launch --machine_rank 0 --num_machines 1 --main_process_port 11135 --num_processes 1 --gpu_ids 0 train.py --face_img_path=datasets_face/test_data_demo/00059.png --output_dir="experiments512/save_00059" --resolution=512 --train_batch_size=1 --checkpointing_steps=50 --gradient_accumulation_steps=1 --seed=42 --learning_rate=5e-5 --l_hair_diff_lambda=0.1
  • Train for your test dataset (Preprocess with FFHQ-Alignment or cut the headshots):

    bash train_for_testset.sh

Test

  • Test StableIdentity: We provide three test mode "test a single image with a single prompt", "test a single image with prompts" and "test all images with prompts" in test.ipynb for developers to use. The results will be generated in results/{index}/.

  • Test StableIdentity & ControlNet: Download the OpenPose's facenet.pth, body_pose_model.pth, body_pose_model.pth in ControlNet's Annotators into models/openpose_models and the ControlNet-SD21.

    # Requirements for ControlNet:
    pip install controlnet_aux

    The test code is test_with_controlnet_openpose.ipynb. The results will be generated in results/{index}/with_controlnet/.

  • Test StableIdentity & LucidDreamer:

    # Requirement for LucidDreamer:
    # Clone this repo by: `git clone https://github.com/qinghew/StableIdentity.git --recursive` to download submodules in `LucidDreamer/submodules/`.
    pip install -r requirements_LucidDreamer.txt
    pip install LucidDreamer/submodules/diff-gaussian-rasterization/
    pip install LucidDreamer/submodules/simple-knn/
    
    # test 
    python LucidDreamer/train.py --opt 'LucidDreamer/configs/stableid.yaml'

    You also could refer the LucidDreamer's preparations. We only edit the code at Line 130 in LucidDreamer/train.py and set the SD2.1 path and prompts in LucidDreamer/configs/stableid.yaml to insert the learned identity into 3D (LucidDreamer). The 3D videos will be generated in LucidDreamer/output/stableid_{index}/videos/.

  • Test StableIdentity & ModelScopeT2V: Download the ModelScopeT2V's pretrained models in ModelScopeT2V into modelscope_t2v_files/.

    # Requirement for ModelScopeT2V:
    pip install -r requirements_modelscope.txt

    The test code is test_with_modelscope.ipynb. Since the ModelScope library lacks some functions for tokenizer and embedding layer, you need to replace the anaconda3/envs/**your_envs**/lib/python3.8/site-packages/modelscope/models/multi_modal/video_synthesis/text_to_video_synthesis_model.py with modelscope_t2v_files/text_to_video_synthesis_model.py. The videos will be generated in results/{index}/with_modelscope/.

TODOs

  • Release training and inference codes
  • Release codes for StableIdentity & ControlNet
  • Release codes for StableIdentity & LucidDreamer for Identity-Driven 3D Generation
  • Release codes for StableIdentity & ModelScopeT2V for Identity-Driven Video Generation

Acknowledgements

❀️ Thanks to all the authors of the used repos and pretrained models, let's push AIGC together!

Citation

@article{wang2024stableidentity,
  title={StableIdentity: Inserting Anybody into Anywhere at First Sight},
  author={Wang, Qinghe and Jia, Xu and Li, Xiaomin and Li, Taiqing and Ma, Liqian and Zhuge, Yunzhi and Lu, Huchuan},
  journal={arXiv preprint arXiv:2401.15975},
  year={2024}
}

About

πŸ”₯ StableIdentity: Inserting Anybody into Anywhere at First Sight

https://qinghew.github.io/StableIdentity/

License:MIT License


Languages

Language:Python 91.3%Language:Jupyter Notebook 8.5%Language:Shell 0.2%