Helkyd / erpnext_ocr

:snake: :alembic: Optical Character Recognition using tesseract within Frappe.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

License: MIT Managed with Taiga.io Build Status Codacy Badge Coverage Status

ERPNext OCR

βš—οΈ Experimental Frappe OCR application with tesseract.

This project is a fork of ERPNext-OCR by John Vincent Fiel. Its aim is to fix and cleanup the original source code and add some new features.

Check out more on ERPNext Discuss.

πŸ“ˆ Changes

See CHANGELOG

πŸ”– Roadmap

See Taiga.io

🚧 Install

Pre-requisites: tesseract-python and imagemagick

Install tesseract-ocr, plus imagemagick and ghostscript (to work with pdf files) using this command on Debian:

sudo apt-get install tesseract-ocr imagemagick libmagickwand-dev ghostscript

Install Frappe application

bench get-app --branch develop erpnext_ocr https://github.com/Monogramm/erpnext_ocr
bench install-app erpnext_ocr

When installing Frappe app, the following python requirements will be installed:

  • python binding for tesseract, tesserocr

  • image processing library in python, pillow

  • HTTP library in python, requests

  • python binding for imagemagick, wand

πŸš€ Usage

File Being Read:

File Being Read

Sample Screenshot:

Sample Screenshot

Tesseract trained data

In order to use OCR with different languages, you need to install the appropriate trained data files. Check tesseract Wiki for details: https://github.com/tesseract-ocr/tesseract/wiki/Data-Files

Development

If you wish to develop or just test locally this application, you can use docker-compose up -d at the root of the this repository. You can then access your ERPNext OCR dev env at http://localhost:8080.

Known issues

βœ… Run tests

bench run-tests --app erpnext_ocr

πŸ‘€ Authors

Monogramm

John Vincent Fiel

🀝 Contributing

Contributions, issues and feature requests are welcome!
Feel free to check issues page. Check the contributing guide.

πŸ‘ Show your support

Give a ⭐ if this project helped you!

πŸ“„ License

Copyright Β© 2019 Monogramm.
This project is MIT licensed.


This README was generated with ❀️ by readme-md-generator

About

:snake: :alembic: Optical Character Recognition using tesseract within Frappe.

License:MIT License


Languages

Language:Python 72.8%Language:Shell 14.6%Language:JavaScript 7.5%Language:Dockerfile 5.1%