deep-neural-networks speech-recognition swahili python

Speech Recognition

Live Transcription of Swahili Audio to Swahili Text

Navigation

Speech Recognition

Introduction

World food Program wants to collect nutritional information of food bought and sold in Kenya. The project is designed to have selected people install an app on their mobile phones, and whenever they buy food, they use their voices to activate the app to register the list of items they have bought in Swahili. The app is expected to live transcribe the voice of the people to text and organize the information in an easy-to-process way in a database

Objective

This project builds, trains and deploy a deep learning model which transcribe audio in Swahili to text in Swahili.

How to start

Machine Setup:

First, you need to have python 3 installed.

Next clone this github link

git clone https://github.com/10Academy-Group-4/Week-4

Finally, you can install the requirements. If you are an Anaconda user: (else replace pip with pip3 and python with python3)

pip install -r requirements.txt

Docker:

This is a containerized flask application with docker image put on docker hub.A docker image is available with all pre-requisites installed. Here is how you use it

Pull docker image

docker pull nebasam/stt-swahili

Run docker image

docker run --rm -it -p 33507:33507/tcp nebasam/stt-swahili:latest

Data

Dataset for Swahili- https://github.com/getalp/ALFFA_PUBLIC

Data_Features

Input features (X): audio clips of spoken words
Target labels (y):  text transcript of what was spoken

Directory_Structure

Artifacts-A directory which contains artifacts such meta files and other artifacts generated through the project
Notebook-A directory which contains notebooks for describing the functionality of the the classes to achieve the meta generation and the preprocessing
Scripts-A directory which contains scripts for Meta generation, preprocessing and feature extraction
test_data-A directory which has data for running tests for every commit or merge on the main branch
tests-A directory which has the codes for testing every commit or merge on the main branch
data.dvc- DVC File for versioning of the data
requirements.txt- A file for dependencies for the project

Testing

The inbuit unittest library in python was used to for the testing of the functions and classes in the project. A .travis.ymal was added to automate testing of any commit or merge made to the main branch. Data used for testing is found in test_data directory

Modelling

To get an idea of how models are setup and investigated, take a look at the notebooks for Models, WordError and Augmentation.

Deployment

The user interface was built with flask. The model was dockerized and deployed on Heroku on https://swahili-stt.herokuapp.com/

Contributors

Michael Darko Ahwireng

Toyin Hawau Olamide

Nebiyu Samuel

Sibitenda Harriet

Same Michael

Mubarak Sani

Khairat Ayinde

About

The project is designed to build a speech recognition system that allows people to install an app on their mobile phones, and whenever they buy food, they use their voices to activate the app to register the list of items they have bought in Swahili.

deep-neural-networks speech-recognition swahili python

Languages

Language:Jupyter Notebook 91.3%Language:Python 6.9%Language:HTML 1.1%Language:JavaScript 0.6%Language:Dockerfile 0.1%