myisaak / serverless-whisper-base

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

🍌 Banana Whisper Base++

Note: This is a fork, all credits go to @Joemgu7

https://github.com/Joemgu7/serverless-whisper-large

This is the ultimate production ready deployment of Whisper. It uses a custom build of Whisper, which is based on the latest Whisper release, but with a few modifications to make it run 10-20% faster than the stock whisper, without sacrificing quality of transcription 🀯

It also has added flexibility, it can accept more parameters than the stock Whisper templates out there:

  • base64String - The base64 encoded audio file
  • format - The format of the audio file. Defaults to mp3, but can be any format supported by ffmpeg.
  • kwargs - A JSON string of additional arguments to pass to whisper.transcribe(). See the Whisper documentation for more information on the available arguments.

It not only returns the text in the result, but also segment information and language information.

πŸš€ Getting Started

On the client, call the model like so:

import banana_dev as banana
import ffmpeg
import base64

# read audio from video/audio and convert to opus with 16k sampling rate, mono channel, 48k bitrate, loglevel error

input_path = "input.mp4"

try:
  out, _ = (
      ffmpeg
      .input(input_path)
      .output('-', format='opus', acodec='libopus', ac=1, ar='16k', b='48k', loglevel='error')
      .run(cmd=['ffmpeg', '-nostdin'], capture_stdout=True, capture_stderr=True)
  )
except Exception as e:
  raise RuntimeError(f"Failed to load audio: {e.stderr.decode()}") from e


# HERE THE MAGIC HAPPENS

opus_bytes_base64 = base64.b64encode(out).decode("ISO-8859-1")

model_inputs = {
  "base64String": opus_bytes_base64,
  "format": "opus",
  "kwargs": {
    "beam_size": 4,
    "temperature": [0.0, 0.2, 0.7],
  }
}

api_key = "YOUR_API_KEY"
model_key = "YOUR_MODEL_KEY"


out = banana.run(api_key, model_key, model_inputs)
result = out["modelOutputs"][0]

# Use the result just as the standard whisper model output

About

License:MIT License


Languages

Language:Python 77.2%Language:Dockerfile 22.8%