Yi-Jen Shih, Hsuan-Fu Wang, Heng-Jui Chang
pip install -r requirements.txt
See Details
bash download_ckpts.sh
You chould see Done downloading all checkpoints
after the script is executed
Notice that it reuqires 2 GPUs for training base models and 4 GPUs for large models
Remember to check the dataset_root
Example: train Parallel SpeechCLIP base:
bash egs/model_base/parallel/train.sh
Example: test Parallel SpeechCLIP base: (Using pretrained checkpoint)
bash egs/model_base/parallel/test.sh
For more settings, please see the folders in
./egs/
.
See example.py
@article{speechclip2022,
title={SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language Model},
author={Yi-Jen Shih and Hsuan-Fu Wang and Heng-Jui Chang and Layne Berry and Hung-yi Lee and David Harwath},
journal={IEEE SLT},
year={2022},
publisher={IEEE}
}
Please run autoformatter before opening PR!
Autoformat ./dev-support/