cisocrgroup / Profiler

The CIS language aware OCR document error profiler

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Profiler

Source code for the language-aware OCR document error profiler. See the Profiler Manual for a description.

References

The profiler has originally been written by Uli Reffle as part of his PhD thesis in computational linguistics at CIS during the IMPACT project (2008-2011).

It has been further developed as a CLARIN-D Kurationsprojekt by Florian Fink at CIS.

Its underlying technology is described in the following publications:

Mihov, Stoyan, and Klaus U. Schulz. 2004. “Fast Approximate Search in Large Dictionaries.” Computational Linguistics 30 (4). MIT Press: 451–77.

Reffle, Ulrich. 2011. Algorithmen und Methoden zur dokumentenspezifischen Analyse historischer und OCR-erfasster Texte. Verlag Dr. Hut.

Reffle, Ulrich, and Christoph Ringlstetter. 2013. “Unsupervised Profiling of OCRed Historical Documents.” Pattern Recognition 46 (5): 1346–57. doi:http://dx.doi.org/10.1016/j.patcog.2012.10.002.

Schulz, Klaus U., and Stoyan Mihov. 2002. “Fast String Correction with Levenshtein Automata.” International Journal on Document Analysis and Recognition 5 (1). Springer: 67–85.

About

The CIS language aware OCR document error profiler


Languages

Language:C++ 94.7%Language:CMake 2.6%Language:C 2.5%Language:Perl 0.1%Language:Dockerfile 0.1%Language:Makefile 0.0%Language:Lex 0.0%