saik2121 / text-segmentation

Document scanner until word segmentation

Home Page:https://medium.com/@arthurflor23/text-segmentation-b32503ef2613

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Text Segmentation

A simple pre-project in python with the handwritten text segmentation module in c++.

Requirements

  • GCC/G++ 8+
  • Python 3.7
  • openCV 3+

Run

python main.py -c -p

or

python3 main.py -c -p

Specify an image

python main.py -c -p --image xxx.png

or

python3 main.py -c -p --image xxx.png

Techniques

  • Document Scanner
  • Binarization with illumination compensation
  • Line Segmentation with deslanting
  • Word Segmentation

Document Scanner

Process of detecting the predominant contour in the image and segment using a four-point transformation. [ref]

Binarization

A technique for light compensation and sauvola binarization was applied, but others techniques was studied also.

  • Implementation of the paper "Efficient Illumination Compensation Techniques for text images", Guillaume Lazzara and Thierry GĂ©raud, 2014. [ref]
  • Niblack, Sauvola and Wolf binarizations. [ref]

Line Segmentation

  • Implementation of the paper "A Statistical approach to line segmentation in handwritten documents", Manivannan Arivazhagan, Harish Srinivasan and Sargur Srihari, 2007. [ref]

  • Deslanting image. [ref]

Word Segmentation

  • Implementation of the paper "Scale Space Technique for Word Segmentation in Handwritten Documents", R. Manmatha and N. Srimal, 1999. [ref]

Binary image

Image lines

First line/words segment

About

Document scanner until word segmentation

https://medium.com/@arthurflor23/text-segmentation-b32503ef2613

License:MIT License


Languages

Language:C++ 96.2%Language:Python 3.8%