KingJamesSong / HouseholderGAN

ICCV23 "Householder Projector for Unsupervised Latent Semantics Discovery"

Home Page:https://arxiv.org/abs/2307.08012

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

HouseholderGAN

ICCV23 paper Householder Projector for Unsupervised Latent Semantics Discovery


Some identified attributes in StyleGAN2/StyleGAN3.

This paper proposes Householder Projector, a flexible and general low-rank orthogonal matrix representation based on Householder transformations, to parameterize the projection matrix of StyleGANs. The orthogonality guarantees that the eigenvectors correspond to disentangled interpretable semantics, while the low-rank property encourages that each identified direction has meaningful variations. We integrate our projector into pre-trained StyleGAN2/StyleGAN3 and evaluate the models on several benchmarks. Within marginally 1% of the original training steps for fine-tuning, our projector helps StyleGANs to discover more disentangled and precise semantic attributes without sacrificing image fidelity.

Environment

conda env create -f householdergan.yml
conda activate householdergan

Pre-process Datasets

All datasets can be downloaded from the official website. For StyleGAN2 pre-processing, please check and run prepare_data.py. For StyleGAN3 pre-processing, please check and run datset_tool.py.

Usage of StyleGAN2

Training on FFHQ:

python -m torch.distributed.launch train_1024.py \
      --nproc_per_node=4 --master_port=9032 \
      train_1024.py --batch 8 [dataset_path] \
      --ckpt [pretrained_model] --size 1024 --ortho_id -2 --iter 10000000 \
      --checkpoints_dir [save_model_path] \
      --sample_dir [save_sample_path] --loadd --training_FULL --diag_size 10 &

Test on FFHQ:

python closed_form_factorization.py --out [factor_path] [save_model_path] --is_ortho &

wait

python apply_factor.py --output_dir [save_results_path] \
  --ckpt [save_model_path] \
   --factor [factor_path] --ortho_id -2 --size 1024 &

wait

Evaluation (FID, PPL, PIPL)

python fid.py [save_model_path] \
        --ortho_id -2 \
        --inception inception_ffhq.pkl \
        --size 1024

wait

python closed_form_factorization.py --out [factor_path] \
    [save_model_path] --is_ortho --diag_size 10 \

wait

python ppl_sefa.py [save_model_path] \
    --factor [factor_path] --ortho_id -2
    --sampling full --eps 1.0 --size 1024 \

wait

python ppl.py [save_model_path] --ortho_id -2 --sampling full --size 1024 &

Usage of StyleGAN3

For hyper-parameters of each dataset like gamma, please refer to the original StyleGAN3 training configuration for details. Here we only show the training script on AFHQv2:

python train.py --outdir=[save_sample_path] --cfg=stylegan3-r \
          --data=[dataset_path] \
      	--cfg=stylegan3-r --gpus=3 --batch-gpu=2 --batch=6 --gamma=16.4 --mbstd-group 2 \
      	--resume=[pretrained_model] \
        --diag_size 10  --is_ortho True --snap 5

Test on AFHQv2:

python closed_form_factorization.py --out [factor_path] \
        --resume_pkl [save_model_path] \
        --is_ortho &

wait

python apply_factor.py --outdir=[save_results_path] --cfg=stylegan3-r  \
      --data=[dataset_path] \
     --gpus=1 --batch-gpu=1 --batch=1 --gamma=16.4 --mbstd-group 2 \
    --resume=[save_model_path] \
    --diag_size 10  --is_ortho True --factor [factor_path]

wait

Usage of StyleGANHuman

cd StyleGANHuman/training_scripts/sg3/

Training on SHHQv1:

python train.py --outdir=[save_results_path] --cfg=stylegan3-r --gpus=4 --batch=16 --gamma=12.4 --mbstd-group 4 \
    --mirror=1 --aug=noaug --data=[dataset_path]  --square=False --snap=5 \
    --resume=[pretrained_model] --diag_size 10  --is_ortho True

Test on SHHQv1:

python closed_form_factorization.py --out [factor_path] \
        --resume_pkl [save_model_path] \
        --is_ortho &
wait

python apply_factor.py --outdir=[save_results_path] --cfg=stylegan3-r  \
      --data=[dataset_path]  \
     --gpus=1 --batch-gpu=1 --batch=1 --gamma=16.4 --mbstd-group 1 \
    --resume=[save_model_path]  \
    --diag_size 10  --is_ortho True --factor [factor_path]

wait

Fine-tuned and Pre-trained Models

We release the pre-trained StyleGANs and our fine-tuned models on different resolutions.

Datset Backbone Resolution Fine-tuned Model Pre-trained Model
FFHQ StyleGAN2 256x256 πŸ”— πŸ”—
FFHQ StyleGAN2 1024x1024 πŸ”— πŸ”—
LSUN Church StyleGAN2 256x256 πŸ”— πŸ”—
LSUN Cat StyleGAN2 256x256 πŸ”— πŸ”—
AFHQv2 StyleGAN3 512x512 πŸ”— πŸ”—
MetFaces StyleGAN3 1024x1024 πŸ”— πŸ”—
SHHQv1 StyleGAN3 512x256 πŸ”— πŸ”—

Citation

If you think the codes are helpful to your research, please consider citing our paper:

@inproceedings{song2023householder,
  title={Householder Projector for Unsupervised Latent Semantics Discovery},
  author={Song, Yue and Zhang, Jichao and Sebe, Nicu and Wang, Wei},
  booktitle={ICCV},
  year={2023}
}

Contact

If you have any questions or suggestions, please feel free to contact us

yue.song@unitn.it or jichao.zhang@unitn.it

About

ICCV23 "Householder Projector for Unsupervised Latent Semantics Discovery"

https://arxiv.org/abs/2307.08012


Languages

Language:Python 82.4%Language:Cuda 13.5%Language:C++ 3.9%Language:Shell 0.1%Language:Dockerfile 0.0%