This is a Python application designed to process images of receipts or invoices using the Donut model for Optical Character Recognition (OCR) and Natural Language Processing (NLP). The application extracts text data from receipt images and provides structured information such as menu items, subtotal, and total.
- Python 3.x
- FastAPI
- PyTorch
- Transformers
- PIL (Python Imaging Library)
Clone this repository to your local machine:
git clone https://github.com/muslimalfatih/scan-bill
Install dependencies
pip install -r requirements.txt
Start the FastAPI server:
uvicorn main:app --reload
- Open your web browser and navigate to http://localhost:8000 to access the application.
- Upload an image of a receipt or invoice to the app. You can try sample receipts in
static
folder - Click the Process Receipt button to extract information from the uploaded image.
- View the extracted data on the results page.
The Donut model is a state-of-the-art deep learning model developed by Naver AI Lab for document understanding tasks. It combines computer vision and natural language processing techniques to extract structured information from unstructured documents such as receipts, invoices, and forms. The model is fine-tuned for receipt processing tasks and achieves high accuracy in recognizing text and extracting relevant information.
- FastAPI Documentation: https://fastapi.tiangolo.com/
- Transformers Documentation: https://huggingface.co/transformers/
- PyTorch Documentation: https://pytorch.org/docs/stable/index.html
- PIL Documentation: https://pillow.readthedocs.io/en/stable/
Note: Ensure that you have a stable internet connection when running the application for the first time to download the Donut model and processor. Subsequent runs will use the downloaded model files stored locally.