hyzhan / StyleDubber

[ACL 2024] This is the Pytorch code for our paper "StyleDubber: Towards Multi-Scale Style Learning for Movie Dubbing"

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

StyleDubber

This package contains the accompanying code for the following paper:

"StyleDubber: Towards Multi-Scale Style Learning for Movie Dubbing", which has appeared as long paper in the Findings of the ACL, 2024.

Illustration

πŸ“£ News

πŸ—’ TODOs

  • Release StyleDubber's training and inference code.
  • Release pretrained weights.
  • Release the raw data and preprocessed data features of the GRID dataset.
  • Metrics Testing Scripts (SECS, WER_Whisper).
  • Update README.md (How to use).
  • Release the preprocessed data features of the V2C-Animation dataset (chenqi-Denoise2).

πŸ“Š Dataset

  • GRID (BaiduDrive (code: GRID) / GoogleDrive)
  • V2C-Animation dataset (chenqi-Denoise2)

πŸ’‘ Checkpoints

We provide the pre-trained checkpoints on GRID and V2C-Animation datasets as follows, respectively:

βš’οΈ Environment

Our python version is 3.8.18 and cuda version 11.5. It's possible to have other compatible version. Both training and inference are implemented with PyTorch on a GeForce RTX 4090 GPU.

conda create -n style_dubber python=3.8.18
conda activate style_dubber
pip install -r requirements.txt

πŸ”₯ Train Your Own Model

You need repalce tha path in preprocess_config (see "./ModelConfig_V2C/model_config/MovieAnimation/config_all.txt") to you own path. Training V2C-Animation dataset (153 cartoon speakers), please run:

python train_StyleDubber_V2C.py

You need repalce tha path in preprocess_config (see "./ModelConfig_GRID/model_config/GRID/config_all.txt") to you own path. Training GRID dataset (33 real-world speakers), please run:

python train_StyleDubber_GRID.py

β­• Inference

Illustration

python 0_evaluate_V2C_Setting1.py --restore_step 47000
python 0_evaluate_V2C_Setting2.py --restore_step 47000
python 0_evaluate_V2C_Setting3.py --restore_step 47000

✏️ Citing

If you find our work useful, please consider citing:

@article{cong2024styledubber,
  title={StyleDubber: Towards Multi-Scale Style Learning for Movie Dubbing},
  author={Cong, Gaoxiang and Qi, Yuankai and Li, Liang and Beheshti, Amin and Zhang, Zhedong and Hengel, Anton van den and Yang, Ming-Hsuan and Yan, Chenggang and Huang, Qingming},
  journal={arXiv preprint arXiv:2402.12636},
  year={2024}
}

πŸ™ Acknowledgments

We would like to thank the authors of previous related projects for generously sharing their code and insights: CDFSE_FastSpeech2, Multimodal Transformer, SMA, Meta-StyleSpeech, and FastSpeech2.

About

[ACL 2024] This is the Pytorch code for our paper "StyleDubber: Towards Multi-Scale Style Learning for Movie Dubbing"

License:MIT License


Languages

Language:Python 99.9%Language:Shell 0.1%