armspeech

This software provides a framework and example experiments for investigation into probabilistic modelling of speech for statistical speech synthesis. There is a particular focus on autoregressive models.

It grew out of experiments with autoregressive acoustic models for the author's PhD thesis, with the goal of allowing rapid prototyping of different models. As such it has been designed with productivity and flexibility in mind rather than runtime speed. It is very much a work in progress.

Set-up

armspeech is hosted on github. To obtain the latest source code using git:

git clone git://github.com/MattShannon/armspeech.git

Many of the formats used in armspeech are similar to those used in HTS. In particular armspeech expects HTS-style speech parameter and label files, for example as produced by the HTS demo. The default method for generating audio from the generated speech parameters is to use the STRAIGHT vocoder. By default the experiments use the CMU ARCTIC corpus, speaker slt.

armspeech has the following dependencies:

CMU ARCTIC corpus, processed into HTS-style speech parameter and label files (for example, by the HTS demo)
if you want to generate audio, STRAIGHT vocoder (which requires MATLAB)
if you want to generate audio, an appropriate HTS demo-style Config.pm file
the codedep python package for code-level dependency tracking
the htk_io python package for reading and writing HTK and HTS files from python
python (>= 2.7) with recent numpy, scipy and matplotlib
if using the HTS demo to generate the required files above (recommended), you should use the STRAIGHT version of the English speaker dependent training demo (which requires HTS, which in turn requires HTK). HTS 2.1 (for HTK 3.4) was used for testing.

To set-up this directory:

add paths to an appropriate data directory and label directory in expt_hts_demo/experiment.py (by editing the strings starting '## TBA'). The data directory should contain .mgc, .lf0 and .bap files. The label directory should contain .lab files, each of which is an alignment with full-context labels. Either phone-level or state-level alignments may be used (but note that some of the example experiments require state-level alignments).
update mgcOrder (two places) and subLabels (one place) in expt_hts_demo/experiment.py (where the corpus objects are created) to have values appropriate for your corpus.
if you want to generate audio, add an appropriate scripts/Config.pm file (e.g. copied from the HTS demo)
if necessary make the files in bin executable (chmod u+x bin/*)

You can then run example experiments using:

bin/run_expt_hts_demo.sh

Currently expt_hts_demo uses the armspeech python package as a library, but the latter is not intended to be a fully-fledged package suitable for separate distribution. This may change as the code matures.

License

Please see the file License for details of the license and warranty for armspeech.

Parts of the code in this directory are based on the following software packages:

GPML toolbox v3.0
HTS demo (STRAIGHT version of the English speaker dependent training demo for HTS 2.1)

Bugs

Please use the issue tracker to submit bug reports.

Contact

The author of armspeech is Matt Shannon.

nd1511 / armspeech

armspeech

Set-up

License

Bugs

Contact

About

Languages