katanaml / sparrow-donut

Data extraction with Donut ML model

Home Page:https://katanaml.io/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Sparrow Donut

Data extraction with ML

The Principle

Sparrow is an innovative open-source solution designed for efficient data extraction and processing from various documents and images. It seamlessly handles forms, invoices, receipts, and other unstructured data sources. Sparrow stands out with its modular architecture, offering independent services such as OCR, Donut fine-tuning/inference, and a data labeling UI, all optimized for robust performance.

Services

  • sparrow-data - This service focuses on data preparation specifically for the Donut ML model, including fine-tuning and OCR integration.
  • sparrow-ml - Dedicated to the Donut ML model, this service handles both fine-tuning and inference, streamlining the machine learning workflow.
  • sparrow-ui - A user-friendly interface for managing Donut ML model data labeling services and a dashboard.

Installation

Donut

Follow the install steps outlined here:

  1. Donut Data install steps

  2. Donut ML install steps

  3. Donut UI install steps

Usage

Donut

Follow the steps outlined here:

  1. Donut Data usage steps

  2. Donut ML usage steps

  3. Donut UI usage steps

Examples

Inference with Donut ML model

Sparrow UI:

Inference Results

Author

Katana ML, Andrej Baranovskij

License

Licensed under the Apache License, Version 2.0. Copyright 2020-2024 Katana ML, Andrej Baranovskij. Copy of the license.

About

Data extraction with Donut ML model

https://katanaml.io/

License:Apache License 2.0


Languages

Language:Jupyter Notebook 86.9%Language:Python 11.9%Language:HTML 0.9%Language:Dockerfile 0.1%Language:Shell 0.1%Language:CSS 0.1%