cafew / speech-to-text

Real-time transcription using faster-whisper

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

speech-to-text

Real-time transcription using faster-whisper

architecture

Accepts audio input from a microphone using a Sounddevice. By using Silero VAD(Voice Activity Detection), silent parts are detected and recognized as one voice data. This audio data is converted to text using Faster-Whisper.

The HTML-based GUI allows you to check the transcription results and make detailed settings for the faster-whisper.

Transcription speed

If the sentences are well separated, the transcription takes less than a second. TranscriptionSpeed

Large-v2 model
Executed with CUDA 11.7 on a NVIDIA GeForce RTX 3060 12GB.

Installation

  1. pip install .

Usage

  1. python -m speech_to_text
  2. Select "App Settings" and configure the settings.
  3. Select "Model Settings" and configure the settings.
  4. Select "Transcribe Settings" and configure the settings.
  5. Select "VAD Settings" and configure the settings.
  6. Start Transcription

Notes

  • If you select local_model in "Model size or path", the model with the same name in the local folder will be referenced.

Demo

demo

News

2023-06-26

  1. Implemented feature to generate audio files from input sound.
  2. Implemented feature to synchronize audio files with transcription. Audio and text highlighting are linked.

2023-06-29

  1. Implemented feature to transcription from audio files.

Todo

  • Save and load previous settings.

  • Use Silero VAD

  • Allow local parameters to be set from the GUI.

About

Real-time transcription using faster-whisper

License:MIT License


Languages

Language:HTML 38.6%Language:JavaScript 25.2%Language:Python 24.6%Language:CSS 11.6%