coqui-ai / open-bible-scripts

scipts for working with open.bible data

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Align Open.Bible data

Language Passing Failing Unknown Notes Aligned Sample
Yoruba πŸ’š Psalm 119
Ewe πŸ’š Psalm 119
Lingala πŸ’š Psalm 119
Asante Twi πŸ’š
Akuapem Twi πŸ’š
Chichewa β€οΈβ€πŸ©Ή Passing with bad alignments Psalm 119
Hausa πŸ’”
Luo πŸ’”
Luganda πŸ’”
Kikuyu πŸ’”
Arabic ❓
Kurdi Sorani ❓
Polish ❓
Vietnamese ❓

Clone this repo

$ git clone https://github.com/coqui-ai/open-bible-scripts.git

Alignment Approach 1: Use the Montreal Forced Aligner

The first alignment approach is to use MFA to align and train a new acoustic model from stratch.

Dependencies

You need to install a couple things on your own:

gnu-parallel covo

Start with the run script for pre-processing

Use the language name as defined in open-bible-scripts/data/*.txt. Use the language code as expected by covo.

E.g., for Yoruba use yoruba and yo, for Ewe use ewe and ee, for Luganda luganda and lg, and so on.

$ cd open-bible-scripts
open-bible-scripts$ ./run-pre-alignment.sh yoruba yo

Generate alignments with mfa train

$ docker run -it --mount "type=bind,src=/home/ubuntu/open-bible-scripts,dst=/mnt" mmcauliffe/montreal-forced-aligner
(base) root@d8095c794d5f:/# conda activate aligner
(aligner) root@d8095c794d5f:/# mfa train --clean --num_jobs `nproc` --temp_directory /mnt/yoruba/data/mfa-tmp-dir --config_path /mnt/MFA_CONFIG /mnt/yoruba/data /mnt/yoruba/dict.txt /mnt/yoruba/data/mfa-output &> /mnt/yoruba/data/LOG &

# At this point, alignment will take a while,
# so you might want to detach from the docker container 
# with `Ctrl-P followed by Ctrl-Q`

Finish with the run script for post-processing

Use the language name as defined in open-bible-scripts/data/*.txt.

E.g., for Yoruba use yoruba, for Ewe use ewe, for Luganda luganda, and so on.

$ cd open-bible-scripts
open-bible-scripts$ ./run-post-alignment.sh yoruba yo

Alignment Approach 2: Use timing files from Biblica

This works for only Lingala, Akuapem Twi, and Asante Twi.

Split using timing file

Install sox on your OS. See linux installation below

sudo apt-get install sox
sudo apt-get install libsox-fmt-mp3
sox --version
python3 -m venv venv
source venv/bin/activate
pip install -U pip
pip install pandas

Execute the run-biblica-splits-*.sh script from the root dir, for example with Lingala:

./run-biblica-splits-lingala.sh

About

scipts for working with open.bible data

License:Apache License 2.0


Languages

Language:Shell 59.3%Language:Python 40.7%