pgs2srt

Uses pgsreader and pyteseract to convert image based pgs subtitles files (.sup) to text based subrip (.srt) files.

Requirements

Python3, pip3, and Tesseract

Installation

Run git clone https://github.com/PimvanderLoos/pgs2srt.git
Inside the repo folder, run pip3 install -r requirements.txt
In your .bashrc or .zshrc add alias pgs2srt='<absolute path to repo>/pgs2srt.py'

How to run

pgs2srt <pgs filename>.sup

Improving accuracy

On Debian and Ubuntu, the default trained models files for Tesseract are from the fast set. While these are a bit faster than other options, this comes at the cost of accuracy. If you want higher accuracy, I'd recommend using either the legacy or the best trained models. Note that the fast and best options only support the LSTM OCR Engine Mode (oem 1).

Caveats

This is in no way a perfect converter, and tesseract will make incorrect interpretations of characters. Extremely alpha, issues, pull requests and suggestions welcome!

Credits

This project uses the common + OCR fixes developed by Sub-Zero.bundle.

About

Read Presentation Graphic Stream (.SUP) files and provide python objects for parsing through the data

Languages

Language:Python 100.0%