qgzang/wavelet_prosody_toolkit

Wavelet prosody analyzer

UPDATE 3.2.2020, Additional command-line tools: batch-processing, global spectrum and analysis-synthesis: tools.rst.

Description

The program calculates f0, energy and duration features from speech wav-file, performs continuous wavelet analysis on combined features, finds prosodic events (prominences, boundaries) from the wavelet scalogram and aligns the events with transcribed units.

Requirements

The wavelet prosody analysis depends on several packages which are installed automatically if you use the procedure describe in ./INSTALL.rst.

Here are the main dependencies:

pyqt5 for the gui (see https://www.riverbankcomputing.com/commercial/pyqt )
pycwt for the wavelet analysis (see https://github.com/regeirk/pycwt/LICENSE.txt )
pyyaml for the configuration (see https://github.com/yaml/pyyaml/blob/master/LICENSE )
matplotlib for the plot rendering (see https://github.com/matplotlib/matplotlib/blob/master/LICENSE/LICENSE )
soundfile for playing waves (see https://github.com/bastibe/SoundFile/blob/master/LICENSE )
wavio for reading/writing wav (see https://github.com/WarrenWeckesser/wavio/blob/master/README.rst )
tgt for reading/writing textgrid (see https://github.com/hbuschme/TextGridTools/blob/master/LICENSE )

Here the optional dependencies:

pyreaper for the f0 extraction (see https://github.com/r9y9/pyreaper/blob/master/LICENSE.md ).

The user is invited to have a look at the license of the dependencies.

Installation

see ./INSTALL.rst

Input information

audio files in wav format
transcriptions in either htk .lab format or Praat textgrids

Usage:

Assuming the installation process is done in global mode, just do

wavelet_gui

Otherwise, go to the root directory of the program in the terminal, and start by

python3 wavelet_prosody_toolkit/wavelet_gui.py

Select directory with speech and transciption files: Select Speech Directory.... Some examples are provided in samples/ directory. Files should have the same root, for example file1.wav, file1.lab or file2.wav file2.TextGrid.
Select features to use in analysis: Prosodic Feats for CWT..
Adjust Pitch tracking parameters for the speaker / environment, press Reprocess to see changes Set range for possible pitch values, typically males ~50-350Hz, females ~100-400Hz. If estimated track skips obviously voiced portions, move voicing threshold slider left.

Alternatively, pre-estimated f0 analyses can be used: file .f0 must exist and it should be either in praat matrix format or as a list file with one f0 value / line, frame shift must be constant 5ms. To get suitable format from Praat, select wav and do:
- To Pitch: 0.005, 120, 400
- To Matrix
- Save as matrix text file: “/.f0”

Adjust the weights of prosodic features and choose if the final signal is combined by summing or multiplying the features
Select which tiers to use for durations signal generation / use duration estimated from signal
Select transcription level of interest: Select Tier
You can interactively zoom and move around with the button on top, and play the visible section
When everything is good, you can Process all which analyzes all utterances in the directory with the current settings, and saves prosodic labels in the speech directory as <wav_file_name>.prom

Prosodic labels are saved in a tab separated form with the following columns:

<file_name> <start_time> <end_time> <unit> <prominence strength> <boundary strength>

Advanced Usage:

Additional customization of the input signals and wavelet analysis is possible by modifying the configuration file. The default configuration is located in:

wavelet_prosody_toolkit/configs/default.yaml

You can view an online version here: https://github.com/asuni/wavelet_prosody_toolkit/blob/master/wavelet_prosody_toolkit/configs/default.yaml

You are recommended to make a copy of the default.yaml file (to e.g. myconfig.yaml), and modify the copy. To apply the modified configuration, start the program by

wavelet_gui --config path/to/myconfig.yaml

Some helpful shortcuts

Here are a list of shortcuts available in the GUI:

CTRL+q to quit
F11 to switch between fullscreen et normal mode

qgzang / wavelet_prosody_toolkit