wmgillett / whisper-yt-transcribe

Python-based CLI tool that uses the Whisper ASR model from OpenAI to transcribe YouTube videos.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Whisper-YT-Transcriber

blog post shield repository shield paper shield model shield

Whisper-YT-Transcriber is a Python-based CLI tool that uses the Whisper ASR model from OpenAI to transcribe YouTube videos. This tool can be used to transcribe an individual YouTube video or a complete YouTube channel. The tool integrates the video metadata with the generated transcriptions into .txt files, which are organized by the video metadata.

The performance of the Whisper Model is quite remarkable and because it is open-source - there are no licensing fees. The models run locally.

There are five model sizes, four with English-only versions, offering speed and accuracy tradeoffs. Below is a comparison chart from Whisper's Docmentation. You can see rhe smaller tier models (tiny and base) use a fraction of the memory and are 8 to 32 times faster compared to the largest two tiers.

Size Parameters English-only model Multilingual model Required VRAM Relative speed
tiny 39 M tiny.en tiny ~1 GB ~32x
base 74 M base.en base ~1 GB ~16x
small 244 M small.en small ~2 GB ~6x
medium 769 M medium.en medium ~5 GB ~2x
large 1550 M N/A large.V1,V2 ~10 GB 1x

Dependencies

  • Python 3.7+
  • Python packages:
    • yt_dlp
    • whisper
    • pandas
    • pytest (for testing)

System Requirements

The Server memory and CPU requirements depend on the Whisper model used and the size of the video files being transcribed. We recommend testing the system with smaller models (tiny.en, small.en, base.en) and shorter video files ( < 15 minutes) to guage system demand before trying larger models and longer videos.

Installation

Clone this repository to your local machine, navigate into the project directory, and install the required Python dependencies.

# clone project
git clone https://github.com/yourusername/youtube-transcriber.git
# cd to working directory
cd youtube-transcriber
# install requirements
pip install -r requirements.txt

Quick Start

To get a list of videos in a YouTube channel:

# using full command
python main.py list CHANNEL_URL CHANNEL_NAME
# using alias
python main.py l CHANNEL_URL CHANNEL_NAME

To transcribe all videos in a YouTube channel:

# using full command
python main.py transcribe_channel CHANNEL_URL CHANNEL_NAME --model MODEL_NAME
# using alias
python main.py tc CHANNEL_URL CHANNEL_NAME -m MODEL_NAME

To transcribe a single YouTube video:

# using full command
python main.py transcribe_video VIDEO_URL --model MODEL_NAME
# using alias
python main.py tv VIDEO_URL -m MODEL_NAME

Whisper Model Options

The Whisper 'base.en' model is defined as the default. You can specify a different model using the --model option.

Supported --model values

  • English: [tiny.en, base.en, small.en, medium.en]
  • Multi-lingual: [tiny, base, small, medium, large.v1, large.v2]

Note: the first time you run this, the specified model is downloaded - which for small models is fairly quick. The downloaded model is stored locally can be accessed by subsequent runs without downloading.

File Structure

The project is organized as follows:

  • main.py: The entry point of the application.
  • transcribe-yt.py: This script contains the logic to transcribe YouTube videos.
  • data/input/: This directory stores the audio files downloaded from YouTube.
  • data/output/: This directory stores the transcriptions generated by the tool.

Testing

The pytest framework is used for running tests on the project. To execute the tests, run the following command:

pytest

Contributing

Contributions are welcome! Please read our Contributing Guide and our Code of Conduct for more information.

License

This project is licensed under the MIT License.

About

Python-based CLI tool that uses the Whisper ASR model from OpenAI to transcribe YouTube videos.


Languages

Language:Python 100.0%