jaddoughman / Multiprocessing-Tesseract-4.0

In an effort to decrease the execution time of the OCR process, a multi-processing script was created using Python's multi-processing module.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Multi-Processing Tesseract 4.0

In an effort to decrease the execution time of the OCR process, a multi-processing script was created using Python's multi-processing module. The script spawns several worker threads, each constantly processing the Tesseract 4.0 OCR jobs appeneded to the job queue by the JobQueueManager.

Brief history

Tesseract was originally developed at Hewlett-Packard Laboratories Bristol and at Hewlett-Packard Co, Greeley Colorado between 1985 and 1994, with some more changes made in 1996 to port to Windows, and some C++izing in 1998. In 2005 Tesseract was open sourced by HP. Since 2006 it is developed by Google.

The latest (LSTM based) stable version is 4.1.0, released on July 7, 2019. Latest source code is available from master branch on GitHub. Open issues can be found in issue tracker, and Planning wiki.

The latest 3.5 version is 3.05.02, released on June 19, 2018. Latest source code for 3.05 is available from 3.05 branch on GitHub. There is no development for this version, but it can be used for special cases (e.g. see Regression of features from 3.0x).

See Release Notes and Change Log for more details of the releases.

Installing Tesseract

You can either Install Tesseract via pre-built binary package or build it from source.

Supported Compilers are:

  • GCC 4.8 and above
  • Clang 3.4 and above
  • MSVC 2015, 2017, 2019

Other compilers might work, but are not officially supported.

Usage

  • Install Tesseract 4.0
  • Add tessdata of your desired language to the tessdata directory
  • Import your input images to the input directory
  • Run main.py [python3 main.py]

About

In an effort to decrease the execution time of the OCR process, a multi-processing script was created using Python's multi-processing module.


Languages

Language:Python 100.0%