Document AI with Hugging Face Transformers

Document AI s a term that has become popular over the last 3 years. It defines machine learning models, tasks, and techniques to classify, parse, and extract information from documents in digital and print forms, like invoices, receipts, licenses, contracts, and business reports.

This repository contains different example and tutorials on how to get started with Document AI and Transformers. Below you can also find a compendium of available models, tasks, datasets and other resources.

Training

Inference

Data-processing

convert FUNSD to donut document for vqa

Demos/Spaces

Community:

popular models are layoutlm.... and Donut which we will use today get a first impression of how you can build you own document AI System using Hugging Face Transformers.

Machine Learning Models (Transformers)

Below you can find a table of the currently available Transformers models, who are achieving state-of-the-art performance on Document AI tasks.

model	paper	license	checkpoints
Donut	arxiv	MIT	huggingface
LiLT	arxiv	MIT	huggingface
LayoutLM	arxiv	MIT	huggingface
LMLayoutXLM	arxiv	CC BY-NC-SA 4.0	huggingface
LayoutLMv2	arxiv	CC BY-NC-SA 4.0	huggingface
LayoutLMv3	arxiv	CC BY-NC-SA 4.0	huggingface
DiT	arxiv	CC BY-NC-SA 4.0	huggingface
TrOCR	arxiv	MIT	huggingface

Tasks

Document AI includes the following use cases and tasks:

document classification (image-classification)
document parsing (form understanding & information extraction)
visual question answering
table detection/layout analysis
optical character recognition (OCR)

Datasets

Dataset	Task	Hugging Face Datasets
SROIE	document parsing	darentang/sroie
RVL-CDIP	document classification	rvl_cdip
XFUND	document parsing	ranpox/xfund
FUNSD	document parsing	nielsr/funsd
CORD	information extraction/parsing	naver-cola-ix/cord-v2
DocVQA	visual question answering	load manually
WildReceipt	document parsing	Theivaprakasham/wildreceipt
TableBank	table detection/layout analysis	load manually
DocBank	table detection/layout analysis	load manually
ReadingBank	table detection/layout analysis	load manually
EATEN	document parsing	load manually
PubLayNet	table detection/layout analysis	jordanparker6/publaynet
ICDAR2019_cTDaR	table detection/layout analysis	load manually

APIs and existing Solutuions

Other Tools

SynthDoG 🐶: Synthetic Document Generator

Resources

OCR-Free Document Understanding with Donut

About

MIT License

Languages

Language:Jupyter Notebook 100.0%