khizar-anjum / signature_extraction

framework for extraction and matching of signatures from scanned documents

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Hello, this repo is for signature extraction by the usage of neural networks.
The main objective of this project is to provide a function that takes in pdf files and extracts all the signatures as well the text in them using neural networks and OCR techniques.

Getting Started

To get started, install all the dependencies via Python pip manager.
pip install -r requirements.txt
This will take care of the packages required for proper working of the extractor class except the installation of PyTorch. To get PyTorch installed, I suggest you go to PyTorch homepage and get it installed according to your system's specifications.

You also have to install PyTesseract-OCR Engine. Follow the instructions on their repo based on your system requirements and you should be fine. Pip only installs a wrapper to pytesseract binary and it does not work without the binary installed on your system.

If you have GPU capability on your system, do not forget to avail it by properly editing extractor.__load_model() method in extractor.py file i.e. by setting device = torch.device('cuda').

Usage

First download the pre-trained Siamese Convolutional Neural Network by running python DownloadModel.py.
The pretrained weights and code for Siamese CNN's have been taken from OfflineSignatureVerification repo.

Afterwards, use the extractor class to perform all the necessary functions. Proper documentation has been provided for all the functions. A simple use case has also been provided in main.py file.

References

[1] Thanks to Aftaab99 for amazing work on Siamese Neural Networks for Offline Signature Verfication.
[2] Thanks to rbaguilla for their repo stipulating the detection of lines, words and paragraphs in a scanned docuemnt.

About

framework for extraction and matching of signatures from scanned documents

License:MIT License


Languages

Language:Python 100.0%