marolAI / Malagasy-speech-data

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Speech Recognition project: data collection for the Malagasy language

This is a repository for the data collection project for the Malagasy language which is a language spoken by the people from Madagascar. The data contains 1 file of the encoded text of all sessions and 101 folders made up of about 20 wav files with their corresponding JSON metadata and text file for each folder. The total of the audio recordings is approximately 3h which is recorded from this source. by using an android app called lig-aikuma .

NOTE: notice that the audio recordings sometimes have background noise due to the noisy environment and they may also contain some corrupted wav files which is an error from the app.

Problems encountered

  • Problem related to the installation of the app : we should install the right version of the app to be adapted with the Android version of the phone.
  • Problem related to the recordings : we should pay attention when doing the recording to not repeat to record each sentence more than one to not have an app error.

Comment

For the preparation of the text to record, it takes time to give the right format putting "##" after each sentence. It should be better if the app recognized each line to be a sentence of the recording.

Contribution

This project is contributed by:

License

This project is licensed under the MIT License - see the LICENSE.md file for details.

About

License:MIT License