air-yan / InvoiceOCR

This project aims to automate the receipt/invoice parsing process.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Invoice-Receipt-OCR

This project aims to automate the receipt/invoice parsing process.

Installation and Prerequisite

Python Modules

# to add rating for text extraction process
pip install python-Levenshtein

# images and preprocessing
pip install Wand
pip install opencv-python

# ocr engine
pip install pytesseract

# PDF text extraction tool -> not required for now
pip install pdfminer.six

Environments

If you are using windows, you should set PATH for imagemagik and tesseract.

TODO

  • Add testing codes
  • Core Functions:
    • amount
    • invoice #
    • bill date vs due date
    • address
    • vendor name
  • Optimize your rating process

About

This project aims to automate the receipt/invoice parsing process.

License:MIT License


Languages

Language:Jupyter Notebook 80.7%Language:Python 19.3%