Kabongosalomon / LiSTra

This repository contains the "Lingala Speech Translation (LiSTra)" dataset

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

LiSTra

This repository contains the "Lingala Speech Translation (LiSTra)" dataset presented on the paper entitled LiSTra, Automatic Speech Translation : English to Lingala casestudy.

Data

For copyright reasons, we are not able to share the audio files however, we provide the extraction pipeline below. We also highlight this pipeline can be used to new languages of interested.

Pipeline

After having you dataset you may need to run the following script to check for specific missing file: - If the two forders contains text: bash check_diff.sh english/ lingala/ false - If the second folder is a folder to the waves : bash check_diff.sh english/ wav_verse/ true - To compare raw_txt with TextGrid : bash check_diff_TextGrid.sh english/raw_txt/ english/maus_textgrid/ true

Note: Please make sure the first param is the txt and the second is wav, if both are txt juste put the last param to false.

Paper Experiments

The speech-to-speech retrieval baseline model proposed at the paper is available here.

Credit

Contact

You can contact them me at skabenamualu@aimsammi.org

About

This repository contains the "Lingala Speech Translation (LiSTra)" dataset

License:MIT License


Languages

Language:Jupyter Notebook 100.0%