dzkb / whisper-dc

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Whisper-powered Voice Message Transcriber

This repository contains code of a simple Discord bot that reacts to voice messages, transcribes them, and sends the transcription as a reply to the original voice message.

The speech-to-text model used is a pre-trained OpenAI's Whisper model (specifically large V3), using the code from SYSTRAN/faster-whisper.

Usage

  • Install Python (3.11 at least)
  • Install Poetry
  • Install dependencies: poetry install
  • Activate the Poetry-created virtualenv: poetry shell
  • Set DISCORD_TOKEN environment variable to your Discord Bot's Token.
  • Run the code: python main.py

The bot reacts to discord messages that have an audio attachment with .ogg extension.

Configuration

The bot supports the following environment variables:

  • DISCORD_TOKEN (required) - the token used to authenticate the bot
  • MODEL_NAME - the name of the model to be loaded by faster-whisper from Hugging Face Hub. Refer to the original repository to learn more about available pre-trained models. Default: large-v3
  • LANGUAGE - the language for which transcription should be done. If not set, the language is detected for each transcription.

Currently, the model is configured to run on the CPU. For CUDA-enabled deployments, refer to the original faster-whisper repository.

About

License:MIT License


Languages

Language:Python 100.0%