theriley106 / BabbleOCR

A module to generate time-based OCR for documents that have an accurate transcript without timestamps. By implementing the Levenshtein distance into the PyTesseract OCR platform, it allows the program to generate results that are significantly more accurate.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Babble OCR

Babble OCR is a wrapper around the Pytesseract OCR module. This program allows you to find specific predefined words within a photo. Babble OCR implements the Levenshtein distance into the PyTesseract OCR platform and allows for a significant increase in OCR accuracy.

My specific use case was to find timestamp information for lyric videos on Youtube - I created this program to increase the accuracy in the method of OCR I was using.

About

A module to generate time-based OCR for documents that have an accurate transcript without timestamps. By implementing the Levenshtein distance into the PyTesseract OCR platform, it allows the program to generate results that are significantly more accurate.


Languages

Language:Python 100.0%