This is a fuzzy receipt parser written in Python. It extracts information like the shop, the date, and the total form receipts. It can work as a standalone script or as part of the IOS and Android application.
This project started as a hackathon idea. Read more about it on the trivago techblog. Also read the comments on HackerNews There's also a talk about the project. The library is now available at PyPi.
The receipt-parser-core
library depend on imagemagick
. Please install imagemagick
with your favorite package manager.
To convert all images from the data/img/
folder to text using tesseract and parse the resulting text files, run
make run
A Dockerfile is available with all dependencies needed to run the program.
To build the image, run
make docker-build
To run it on the sample files, try
make docker-run
By default, running the image will execute the make run
command. To use with your own images, run the following:
docker run -v <path_to_input_images>:/usr/src/app/data/img mre0/receipt_parser