ddanco / blueprint-oss

Declarative document extraction

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Blueprint

Blueprint is a declarative extraction language for semi-structured documents.

Setup

Start by cloning this repo to your machine.

CLI

To run on a sample paystub:

  • Add path/to/blueprint-oss/blueprint/py to your PYTHONPATH
  • Run pip3 install -r path/to/blueprint-oss/blueprint/requirements.txt
  • From the blueprint/reference_extractions/paystubs folder, run python3 paystubs.py run_model -v -g ocr/sample_paystub.jpg.json

To generate your own OCR documents:

Server

TODO

Studio

TODO

About

Declarative document extraction

License:MIT License


Languages

Language:Python 53.7%Language:TypeScript 43.0%Language:CSS 2.9%Language:Shell 0.2%Language:JavaScript 0.2%Language:HTML 0.0%