team-re-verb / RE-VERB

speaker diarization system using an LSTM

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool


Logo

Logo


About the project

RE: VERB is speaker diarization system, it allows the user to send/record audio of a conversation and receive timestamps of who spoke when

RE:VERB is our final project in Magshimim, and consists of a web client and a server.

  • The client can record audio and show the the timestamp results graphically

  • The server can be used with many other clients with the simple REST API it has.

Built With

client

server

  • Pytorch - library for deep learning with python that has great support for GPUs with CUDA

  • Express.js - Node.js web server framework

Getting Started

The project contains the server and the web client(a CLI client also exists for debug purposes).

the server is located at ./server and the web client is located at ./client/website.


Server

The model alongside the scripts for downloading, training and the weights from our training is located at ./server/speech_diarization/model

we used Docker to create a cross-platform environment to run the server on.

The server is made up of:

  • a container for the web server
  • a container for the diarization process
  • a container for a redis database that will allow the others to communicate

docker compose will run and manage all 3 at once

Docker and docker-compose need to be installed in order to build and run the server, all the rest will be taken care of.

Installing

cd server
docker-compose up

This will run all 3 containers and install dependencies.

If you make a change in the server, use

docker-compose up --build

to rebuild.

usage:

sending a HTTP POST request with an audio file to the server at http://localhost:1337/upload (default port and url) will return a JSON file with the timestamps in milliseconds.

{"0": [[40, 120], [3060, 3460], [3480, 3560]], "1": [[1260, 1660], [1680, 1960]]}

Client

The client needs npm or yarn to be installed, more info about the client can be found here.

to install:

cd client/website
npm install

afterwards you can use

npm run serve

to run a development server


Authors

License

This project is licensed under the MIT License - see the LICENSE.md file for details

Acknowledgments

  • The diarization algorithm is an implementation of this research, we also used their implementation of the spectral clustering

  • We took inspiration and some code from Harry volek's implementation of a different but similar problem - Speaker Verification

Future Plans

  • We had problems with training on the AMI corpus so we used the TIMIT corpus for the model provided.

  • We plan to train again on the VoxCeleb 1 and 2 datasets which contain a lot more data and hopefully improve feature extraction

  • We want to add integration with a speech-to-text service and transcribe the created segments

About

speaker diarization system using an LSTM

License:MIT License


Languages

Language:Python 76.3%Language:Vue 12.1%Language:JavaScript 7.2%Language:HTML 2.8%Language:Dockerfile 1.3%Language:Shell 0.2%