This is an implementation of the papers:
(1) Language-independent speaker anonymization approach using self-supervised pre-trained models
(2) Analyzing Language-Independent Speaker Anonymization Framework under Unseen Conditions
The authors are Xiaoxiao Miao, Xin Wang, Erica Cooper, Junichi Yamagishi, Natalia Tomashenko.
Audio samples can be found here: https://nii-yamagishilab.github.io/SAS-audio-samples/
Please cite these papers if you use this code.
git clone https://github.com/nii-yamagishilab/SSL-SAS.git
cd SSL-SAS
bash scripts/install.sh
Make sure sox and parallel are installed.
If not:
source env.sh
conda install -c conda-forge sox
conda install -c conda-forge parallel
-
Try pre-trained model
- Download English development and evaluation data provided by the VoicePrivacy2020 Challenge: VCTK-subsets (vctk_dev and vctk_test) and LibriSpeech-subsets (libri_dev and libri_test). Just run
bash adapted_from_vpc/00_download_testdata.sh
. The user will be requested the password, please contact VoicePrivacy2020 Challenge organizers. - Generate anonymized speech:
bash scripts/engl_scripts/01_demo.sh
. - Following the VoicePrivacy2020 Challenge to compute the performance.
- Download English development and evaluation data provided by the VoicePrivacy2020 Challenge: VCTK-subsets (vctk_dev and vctk_test) and LibriSpeech-subsets (libri_dev and libri_test). Just run
-
Train a HiFi-GAN using LibriTTS-100h on your own:
bash scripts/engl_scripts/02_train.sh
Mandarin models and speaker vectors are available for internal academic and research use only. If users would like to reproduce Mandarin anonymization experiments, please contact xiaoxiaomiao@nii.ac.jp.
This study is supported by JST CREST Grants (JPMJCR18A6 and JPMJCR20D3), MEXT KAKENHI Grants (21K17775, 21H04906, 21K11951, 18H04112), and the VoicePersonal project (ANR-18-JSTS-0001)
The adapted_from_facebookreaserch
subfolder has Attribution-NonCommercial 4.0 International License. The adapted_from_speechbrain
subfolder has Apache License. They were created by the facebookreasearch and speechbrain orgnization, respectively. The scripts
subfolder has the MIT license.
Because this source code was adapted from the facebookresearch and speechbrain, the whole project follows
the Attribution-NonCommercial 4.0 International License.
Copyright (c) 2022, Yamagishi Laboratory, National Institute of Informatics. All rights reserved.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.