starbuck93 / receipt-parser

A supermarket receipt parser written in Python using tesseract OCR

Home Page:https://tech.trivago.com/2015/10/06/python_receipt_parser/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

A fuzzy receipt parser written in Python

This is a fuzzy receipt parser written in Python. It extracts information like the shop, the date, and the total form receipts. It can work as a standalone script or as part of the IOS and Android application.

History

This project started as a hackathon idea. Read more about it on the trivago techblog. Also read the comments on HackerNews There's also a talk about the project. The library is now available at PyPi.

Dependencies

The receipt-parser-core library depend on imagemagick. Please install imagemagick with your favorite package manager.

Usage

To convert all images from the data/img/ folder to text using tesseract and parse the resulting text files, run

make run

Docker

A Dockerfile is available with all dependencies needed to run the program.
To build the image, run

make docker-build

To run it on the sample files, try

make docker-run

By default, running the image will execute the make run command. To use with your own images, run the following:

docker run -v <path_to_input_images>:/usr/src/app/data/img mre0/receipt_parser

About

A supermarket receipt parser written in Python using tesseract OCR

https://tech.trivago.com/2015/10/06/python_receipt_parser/

License:Apache License 2.0


Languages

Language:Python 97.1%Language:Makefile 2.2%Language:Dockerfile 0.6%