`whisper-asr-webapp`

A web app for automatic speech recognition using OpenAI's Whisper model running locally.

# Quickstart with Docker:
docker run --rm -it -p 8000:8000 -v whisper_models:/root/.cache/whisper ghcr.io/fluxcapacitor2/whisper-asr-webapp:main

Features

Customize the model, language, and initial prompt
Enable per-word timestamps (visible in downloaded JSON output)
Runs Whisper locally
Pre-packaged into a single Docker image
View timestamped transcripts in the app
Download transcripts in plain text, VTT, SRT, TSV, or JSON formats

Architecture

The frontend is built with Svelte and builds to static HTML, CSS, and JS.

The backend is built with FastAPI. The main endpoint, /transcribe, pipes an uploaded file into ffmpeg, then into Whisper. Once transcription is complete, it's returned as a JSON payload.

In a containerized environment, the static assets from the frontend build are served by the same FastAPI (Uvicorn) server that handles transcription.

Running

Pull and run the image with Docker.
- Run in an interactive terminal: docker run --rm -it -p 8000:8000 -v whisper_models:/root/.cache/whisper ghcr.io/fluxcapacitor2/whisper-asr-webapp:main
- Run in the background: docker run -d -p 8000:8000 -v whisper_models:/root/.cache/whisper ghcr.io/fluxcapacitor2/whisper-asr-webapp:main
Visit http://localhost:8000 in a web browser

About

A web app for automatic speech recognition using OpenAI's Whisper model running locally.

Languages

Language:Svelte 74.9%Language:TypeScript 10.4%Language:Python 7.7%Language:Dockerfile 2.3%Language:JavaScript 1.9%Language:CSS 1.2%Language:HTML 1.1%Language:Shell 0.4%