Helmut Wollmersdorfer's repositories
Text-Levenshtein-BV
Levenshtein using bit vectors
ocr-measures
scripts reporting scores and statistics
Text-Guess-Script
Guess script from text using iso15924 codes
Text-Levenshtein-BVXS
Text::Levenshtein::BVXS - fast implementation using bit vectors
ocr-deu-bio-testfiles
German language (nature, biology) ground truth
AustrianNewspapers
NewsEye / READ OCR training dataset from Austrian Newspapers
Denoising-Diffusion-Probabilistic-Models-with-MNIST
This notebook is based on the paper Denoising Diffusion Probabilistic Models by Jonathan Ho, Ajay Jain and Pieter Abbeel. The porpuse of this notebook is to understand the basic idea of the paper.
fancy-memset
small, fast memset based on microsoft's design
guacamole
Guacamole is a parser toolkit for Standard Perl. It provides fully static BNF-based parsing capability to a reasonable subset of Perl.
hocrmod
Try to find regions missed by Tesseract.
Levenshtein-Simple
Levenshtein algorithm in the simple or "naive" implementation as a reference
limboole
Fork of the Limboole SAT solver frontend from http://fmv.jku.at/limboole/ modified to be executable using WebAssembly on the web.
LiTeX
Live Text Command Line Tool
nougat
Implementation of Nougat Neural Optical Understanding for Academic Documents
ocr-bbox-gt
Ground Truth for Bounding Boxes
ocr-gt-tools-mojo
OCR GT tools implemented with Mojolicious
OpenCV-Document-Scanner
An interactive document scanner built in Python using OpenCV featuring automatic corner detection, image sharpening, and color thresholding.
page_dewarp
Text page dewarping using a "cubic sheet" model
Python
All Algorithms implemented in Python
rapidfuzz-cpp
Rapid fuzzy string matching in C++ using the Levenshtein Distance
Text-Levenshtein-Uni
Text-Levenshtein-Uni - calculate Levenshtein distance for Unicode (UTF-8 or U32) strings
utf8-bench
utf8-bench - UTF-8 Benchmarks