Acoustic-based distance calculation

An acoustic-based method that can be used to calculate the distance between pronunciations.

Requirements

Python 3.6
PRAAT 6.1.08
Hidden Markov Toolkit (HTK) 3.4
FAVE
R

Getting Started

git clone https://github.com/Bartelds/acoustic-distance-measure.git

Data

Identifiers of the audio samples used can be found in Audio

Data source: http://accent.gmu.edu/browse_language.php

Usage

Before the distances can be computed, the input data must be preprocessed once (step 1:4). This can be done by adhering to the following procedure:

1: Forced-alignment

Input: audio files

Output: aligned .TextGrid files

Forced-alignment

Forced-alignment is introduced to capture the words present inside the audio files. The Penn Phonetics Lab Forced Aligner is used to accomplish the task of forced-alignment.

Resample all audio files to 16 KHz mono PCM.
Create a transcript file that contains all the words spoken in the audio samples.
Run alignment: fa.sh
Extract start and end of words: extract_fa.praat
Segment paragraphs into words: wavsplitter.py

2: MFCC generation

Generate MFCCs.

MFCC

Generate .scp listing that suits your data: example_hcopy.scp
Use config.txt with HTK parameters.
Generate MFCCs: HCopy -T 1 -C config -S example_hcopy.scp
HTK compressed format should be exported: ./exporthtk.sh

3: Acoustic-based distance calculation

Distances are calculated using Dynamic Time Warping.

DTW

dtw.R computes the distances (includes normalization).

gaozhiyan / acoustic-distance-measure