This software provides a framework and example experiments for investigation into probabilistic modelling of speech for statistical speech synthesis. There is a particular focus on autoregressive models.
It grew out of experiments with autoregressive acoustic models for the author's PhD thesis, with the goal of allowing rapid prototyping of different models. As such it has been designed with productivity and flexibility in mind rather than runtime speed. It is very much a work in progress.
armspeech is hosted on github. To obtain the latest source code using git:
git clone git://github.com/MattShannon/armspeech.git
Many of the formats used in armspeech are similar to those used in HTS. In particular armspeech expects HTS-style speech parameter and label files, for example as produced by the HTS demo. The default method for generating audio from the generated speech parameters is to use the STRAIGHT vocoder. By default the experiments use the CMU ARCTIC corpus, speaker slt.
armspeech has the following dependencies:
- CMU ARCTIC corpus, processed into HTS-style speech parameter and label files (for example, by the HTS demo)
- if you want to generate audio, STRAIGHT vocoder (which requires MATLAB)
- if you want to generate audio, an appropriate HTS demo-style
Config.pm
file - the codedep python package for code-level dependency tracking
- the htk_io python package for reading and writing HTK and HTS files from python
- python (>= 2.7) with recent numpy, scipy and matplotlib
- if using the HTS demo to generate the required files above (recommended), you should use the STRAIGHT version of the English speaker dependent training demo (which requires HTS, which in turn requires HTK). HTS 2.1 (for HTK 3.4) was used for testing.
To set-up this directory:
- add paths to an appropriate data directory and label directory in
expt_hts_demo/experiment.py
(by editing the strings starting '## TBA'). The data directory should contain.mgc
,.lf0
and.bap
files. The label directory should contain.lab
files, each of which is an alignment with full-context labels. Either phone-level or state-level alignments may be used (but note that some of the example experiments require state-level alignments). - update
mgcOrder
(two places) andsubLabels
(one place) inexpt_hts_demo/experiment.py
(where the corpus objects are created) to have values appropriate for your corpus. - if you want to generate audio, add an appropriate
scripts/Config.pm
file (e.g. copied from the HTS demo) - if necessary make the files in
bin
executable (chmod u+x bin/*
)
You can then run example experiments using:
bin/run_expt_hts_demo.sh
Currently expt_hts_demo
uses the armspeech
python package as a library, but
the latter is not intended to be a fully-fledged package suitable for separate
distribution.
This may change as the code matures.
Please see the file License
for details of the license and warranty for armspeech.
Parts of the code in this directory are based on the following software packages:
- GPML toolbox v3.0
- HTS demo (STRAIGHT version of the English speaker dependent training demo for HTS 2.1)
Please use the issue tracker to submit bug reports.
The author of armspeech is Matt Shannon.