Autovocoder: Fast Waveform Generation from a Learned Speech Representation using Differentiable Digital Signal Processing

Unofficial Pytorch implementation of Autovocoder: Fast Waveform Generation from a Learned Speech Representation using Differentiable Digital Signal Processing. This repository is based on iSTFTNet github (Paper).

Disclaimer : This repo is build for testing purpose. Welcome your contributions.

Training :

python train.py --config config.json

In config.json, change latent_dim for AV128, AV192, and AV256 (Default).
Considering Section 3.3, you can select dec_istft_input between cartesian (Default), polar, and both.

Note:

Citations :

@article{Webber2022AutovocoderFW,
  title={Autovocoder: Fast Waveform Generation from a Learned Speech Representation using Differentiable Digital Signal Processing},
  author={Jacob J. Webber and Cassia Valentini-Botinhao and Evelyn Williams and Gustav Eje Henter and Simon King},
  journal={ArXiv},
  year={2022},
  volume={abs/2211.06989}
}

References:

About

Autovocoder: Fast Waveform Generation from a Learned Speech Representation using Differentiable Digital Signal Processing

Apache License 2.0

Languages

Language:Python 100.0%