Whisper-YT-Transcriber

Whisper-YT-Transcriber is a Python-based CLI tool that uses the Whisper ASR model from OpenAI to transcribe YouTube videos. This tool can be used to transcribe an individual YouTube video or a complete YouTube channel. The tool integrates the video metadata with the generated transcriptions into .txt files, which are organized by the video metadata.

The performance of the Whisper Model is quite remarkable and because it is open-source - there are no licensing fees. The models run locally.

There are five model sizes, four with English-only versions, offering speed and accuracy tradeoffs. Below is a comparison chart from Whisper's Docmentation. You can see rhe smaller tier models (tiny and base) use a fraction of the memory and are 8 to 32 times faster compared to the largest two tiers.

Size	Parameters	English-only model	Multilingual model	Required VRAM	Relative speed
tiny	39 M	`tiny.en`	`tiny`	~1 GB	~32x
base	74 M	`base.en`	`base`	~1 GB	~16x
small	244 M	`small.en`	`small`	~2 GB	~6x
medium	769 M	`medium.en`	`medium`	~5 GB	~2x
large	1550 M	N/A	`large.V1,V2`	~10 GB	1x

Dependencies

Python 3.7+
Python packages:
- yt_dlp
- whisper
- pandas
- pytest (for testing)

System Requirements

The Server memory and CPU requirements depend on the Whisper model used and the size of the video files being transcribed. We recommend testing the system with smaller models (tiny.en, small.en, base.en) and shorter video files ( < 15 minutes) to guage system demand before trying larger models and longer videos.

Installation

Clone this repository to your local machine, navigate into the project directory, and install the required Python dependencies.

# clone project
git clone https://github.com/yourusername/youtube-transcriber.git
# cd to working directory
cd youtube-transcriber
# install requirements
pip install -r requirements.txt

Quick Start

To get a list of videos in a YouTube channel:

# using full command
python main.py list CHANNEL_URL CHANNEL_NAME
# using alias
python main.py l CHANNEL_URL CHANNEL_NAME

To transcribe all videos in a YouTube channel:

# using full command
python main.py transcribe_channel CHANNEL_URL CHANNEL_NAME --model MODEL_NAME
# using alias
python main.py tc CHANNEL_URL CHANNEL_NAME -m MODEL_NAME

To transcribe a single YouTube video:

# using full command
python main.py transcribe_video VIDEO_URL --model MODEL_NAME
# using alias
python main.py tv VIDEO_URL -m MODEL_NAME

Whisper Model Options

The Whisper 'base.en' model is defined as the default. You can specify a different model using the --model option.

Supported --model values

English: [tiny.en, base.en, small.en, medium.en]
Multi-lingual: [tiny, base, small, medium, large.v1, large.v2]

Note: the first time you run this, the specified model is downloaded - which for small models is fairly quick. The downloaded model is stored locally can be accessed by subsequent runs without downloading.

File Structure

The project is organized as follows:

main.py: The entry point of the application.
transcribe-yt.py: This script contains the logic to transcribe YouTube videos.
data/input/: This directory stores the audio files downloaded from YouTube.
data/output/: This directory stores the transcriptions generated by the tool.

Testing

The pytest framework is used for running tests on the project. To execute the tests, run the following command:

pytest

Contributing

Contributions are welcome! Please read our Contributing Guide and our Code of Conduct for more information.

License

This project is licensed under the MIT License.

wmgillett / whisper-yt-transcribe