speaker-diarization ml machine-learning ai lstm neural-network pytorch docker vue reverb redis express

About the project

RE: VERB is speaker diarization system, it allows the user to send/record audio of a conversation and receive timestamps of who spoke when

RE:VERB is our final project in Magshimim, and consists of a web client and a server.

The client can record audio and show the the timestamp results graphically
The server can be used with many other clients with the simple REST API it has.

Built With

client

Vue.js - The front end framework used
Wavesurfer.js - A library for waveform visualization

server

Pytorch - library for deep learning with python that has great support for GPUs with CUDA
Express.js - Node.js web server framework

Getting Started

The project contains the server and the web client(a CLI client also exists for debug purposes).

the server is located at ./server and the web client is located at ./client/website.

Server

The model alongside the scripts for downloading, training and the weights from our training is located at ./server/speech_diarization/model

we used Docker to create a cross-platform environment to run the server on.

The server is made up of:

a container for the web server
a container for the diarization process
a container for a redis database that will allow the others to communicate

docker compose will run and manage all 3 at once

Docker and docker-compose need to be installed in order to build and run the server, all the rest will be taken care of.

Installing

cd server
docker-compose up

This will run all 3 containers and install dependencies.

If you make a change in the server, use

docker-compose up --build

to rebuild.

usage:

sending a HTTP POST request with an audio file to the server at http://localhost:1337/upload (default port and url) will return a JSON file with the timestamps in milliseconds.
{"0": [[40, 120], [3060, 3460], [3480, 3560]], "1": [[1260, 1660], [1680, 1960]]}

Client

The client needs npm or yarn to be installed, more info about the client can be found here.

to install:

cd client/website
npm install

afterwards you can use

npm run serve

to run a development server

Authors

Ofir Naccache - ofirnaccache
Matan Yesharim - Tralfazz

License

This project is licensed under the MIT License - see the LICENSE.md file for details

Acknowledgments

The diarization algorithm is an implementation of this research, we also used their implementation of the spectral clustering
We took inspiration and some code from Harry volek's implementation of a different but similar problem - Speaker Verification

Future Plans

We had problems with training on the AMI corpus so we used the TIMIT corpus for the model provided.
We plan to train again on the VoxCeleb 1 and 2 datasets which contain a lot more data and hopefully improve feature extraction
We want to add integration with a speech-to-text service and transcribe the created segments

About

speaker diarization system using an LSTM

speaker-diarization ml machine-learning ai lstm neural-network pytorch docker vue reverb redis express

MIT License

Languages

Language:Python 76.3%Language:Vue 12.1%Language:JavaScript 7.2%Language:HTML 2.8%Language:Dockerfile 1.3%Language:Shell 0.2%