RetroZhou/Whisper-API-demo

Welcome to the Voice-to-Text (Whisper) API

Setup

First, you must install Python dependencies:

pip install -r requirements.txt

It also requires the command-line tool ffmpeg to be installed on your system, which is available from most package managers:

# on Ubuntu or Debian
sudo apt update && sudo apt install ffmpeg

# on Arch Linux
sudo pacman -S ffmpeg

# on MacOS using Homebrew (https://brew.sh/)
brew install ffmpeg

# on Windows using Chocolatey (https://chocolatey.org/)
choco install ffmpeg

# on Windows using Scoop (https://scoop.sh/)
scoop install ffmpeg

Usage

Start Server

python server.py

You can set the host and port of the service:

python server.py --host 0.0.0.0 --port 8888

Use Server

curl -F "file=@examples/en.mp3" http://127.0.0.1:7860/whisper

You should do this when you want to use other models:

# Use `base`
curl -X POST -F "file=@examples/en.mp3" -F "model_type=base" http://127.0.0.1:7860/whisper

# Use `base.en`
curl -X POST -F "file=@examples/en.mp3" -F "model_type=base.en" http://127.0.0.1:7860/whisper

Comparison of different models:

`model_type`	Required VRAM	Parameters	Relative speed
`tiny.en` or `tiny`	~1 GB	39 M	~32x
`base.en` or `base`	~1 GB	74 M	~16x
`small.en` or `small`	~2 GB	244 M	~6x
`medium.en` or `medium`	~5 GB	769 M	~2x
`large`	~10 GB	1550 M	1x

The .en models for English-only applications tend to perform better, especially for the tiny.en and base.en models. We observed that the difference becomes less significant for the small.en and medium.en models.

RetroZhou / Whisper-API-demo

Setup

Usage

Start Server

Use Server

About

Languages