PimvanderLoos / pgs2srt

Read Presentation Graphic Stream (.SUP) files and provide python objects for parsing through the data

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

pgs2srt

Uses pgsreader and pyteseract to convert image based pgs subtitles files (.sup) to text based subrip (.srt) files.

Requirements

Python3, pip3, and Tesseract

Installation

  • Run git clone https://github.com/PimvanderLoos/pgs2srt.git
  • Inside the repo folder, run pip3 install -r requirements.txt
  • In your .bashrc or .zshrc add alias pgs2srt='<absolute path to repo>/pgs2srt.py'

How to run

pgs2srt <pgs filename>.sup

Improving accuracy

On Debian and Ubuntu, the default trained models files for Tesseract are from the fast set. While these are a bit faster than other options, this comes at the cost of accuracy. If you want higher accuracy, I'd recommend using either the legacy or the best trained models. Note that the fast and best options only support the LSTM OCR Engine Mode (oem 1).

Caveats

This is in no way a perfect converter, and tesseract will make incorrect interpretations of characters. Extremely alpha, issues, pull requests and suggestions welcome!

Credits

This project uses the common + OCR fixes developed by Sub-Zero.bundle.

About

Read Presentation Graphic Stream (.SUP) files and provide python objects for parsing through the data


Languages

Language:Python 100.0%