tjysdsg/tone_classifier

Place your own phone_ctm.txt file in project root dir, or use the default one generated from https://github.com/tjysdsg/aidatatang_force_align on AISHELL-3 data
Run

  python feature_extration.py

to collect required statistics (phone start time, duration, tones, etc). Results are saved to utt2tones.json

  python trian/embedding/split_wavs.py

to split train, test, and validation dataset for embedding model training

The test utterances used in the paper are listed in test_utts.json

python train/train_embedding.py

to train embedding model, the results are in exp/

Mel-spectrogram cache is generated at exp/cache/spectro/wav.scp and exp/cache/spectro/*.npy

[Optional] Train an end-to-end tone recognizer

After step 3,

./run.sh

Mandarin Tone Classifier

Language:Jupyter Notebook 57.2%Language:Python 39.5%Language:Shell 3.2%