DeepSpeech REST API

This REST API is built on top of Mozilla's DeepSpeech. It is written based on examples provided by Mozilla. It accepts HTTP methods such as GET and POST as well as WebSocket. To perform transcription using HTTP methods is appropriate for relatively short audio files while the WebSocket can be used even for longer audio recordings.

Project setup

Clone the repository to your local machine and change directory to deepspeech-rest-api

git clone https://github.com/fabricekwizera/deepspeech-rest-api.git
cd deepspeech-rest-api

2. Create a virtual environment and activate it (assuming that it is installed your machine) and install the project in editable mode (locally).

virtualenv -p python3 venv
source venv/bin/activate
pip install --editable .

Download the model and the scorer. For English model and scorer, follow below links

wget https://github.com/mozilla/DeepSpeech/releases/download/v0.9.3/deepspeech-0.9.3-models.pbmm \
    -O deepspeech_model.pbmm
wget https://github.com/mozilla/DeepSpeech/releases/download/v0.9.3/deepspeech-0.9.3-models.scorer \
    -O deepspeech_model.scorer

For other languages, you can place the two files in the current working directory under the names deepspeech_model.pbmm for the model and deepspeech_model.scorer for the scorer.

Migrations are done using Alembic

Running the server

python3 run.py

Usage of the API

curl -X POST \
http://0.0.0.0:8000/users \
-H 'Content-Type: application/json' \
-d '{
"username": "forrestgump",
"email": "fgump@yourdomain.com",
"password": "yourpassword"
}'

API response

{
  "message": "User forrestgump is successfully created."
}

To generate a JWT token to access the API

curl -X POST \
http://0.0.0.0:8000/token \
-H 'Content-Type: application/json' \
-d '{
"username": "forrestgump",
"password": "yourpassword"
}'

If both steps are done correctly, you should get a token in below format

{
    "access_token": "JWT_token",
    "refresh_token": "Refresh_token"
}

With this JWT_token, you have access to different endpoints of the API, and the Refresh_token is used to refresh the access token when it expires.

To refresh a JWT token

curl -X POST \
http://0.0.0.0:8000/token/refresh \
-H "Content-Type: application/json" \
-H "Authorization: Bearer JWT_token" \
-d '{
    "refresh_token": "Refresh_token"
}'

Performing STT (Speech-To-Text)

Change directory to audio and use the WAV files provided for testing.

STT the HTTP way

cURL

curl -X POST \
http://0.0.0.0:8000/api/stt/http \
-H 'Authorization: Bearer JWT_token' \
-F 'audio=@8455-210777-0068.wav' \
-F 'paris=-1000' \
-F 'power=1000' \
-F 'parents=-1000'

python

import requests

jwt_token = 'JWT_token'
headers = {'Authorization': 'Bearer ' + jwt_token}
hot_words = {'paris': -1000, 'power': 1000, 'parents': -1000}
audio_filename = 'audio/8455-210777-0068.wav'
audio = [('audio', open(audio_filename, 'rb'))]
url = 'http://0.0.0.0:8000/api/stt/http'
response = requests.post(url, data=hot_words, files=audio, headers=headers)
print(response.json())

Note the usage of hot-words and their boosts in the request.

STT the WebSocket way (simple test)

WebSockets don't support curl. To take advantage of this feature, you will have to write a web app to send request to ws://0.0.0.0:8000/api/stt/ws.

Below command can be used to check if the WebSocket is running.

python3 test_websocket.py

In the both cases (HTTP and WebSocket), you should get a result in below format.

{
  "message": "experience proves this",
  "time": 1.4718825020026998
}

JRMeyer / deepspeech-rest-api

DeepSpeech REST API

Project setup

Usage of the API

Performing STT (Speech-To-Text)

About

Languages