There are 35 repositories under document-ai topic.
Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator (SynthDoG), ECCV 2022
A Repo For Document AI
A curated list of resources for Document Understanding (DU) topic
Official PyTorch implementation of LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding (ACL 2022)
Algorithms, papers, datasets, performance comparisons for Document AI. Continuously updating.
ReadingBank: A Benchmark Dataset for Reading Order Detection
SlideVQA: A Dataset for Document Visual Question Answering on Multiple Images (AAAI2023)
An unofficial PyTorch implementation of "Lin et al. ViBERTgrid: A Jointly Trained Multi-Modal 2D Document Representation for Key Information Extraction from Documents. ICDAR, 2021"
Table detection (TD) and table structure recognition (TSR) using Yolov5/Yolov8, and you can get the same (even better) result compared with Table Transformer (TATR) with smaller models.
This repository includes all computer vision, audio, document AI, and multimodal projects.
Document AI Toolbox is an SDK for Python that provides utility functions for managing, manipulating, and extracting information from the document response. It creates a "wrapped" document object from JSON files in Cloud Storage, local JSON files, or output directly from the Document AI API.
[CVPR2025] VDocRAG: Retirval-Augmented Generation over Visually-Rich Documents
[MM'2024] PEneo, an effective algorithm for key-value pair extraction from form-like documents, designed for real-world applications.
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
A curated list of resources on Table Structure Recognition
[MM'2024] Official release of RFUND introduced in the MM'2024 paper "PEneo: Unifying Line Extraction, Line Grouping, and Entity Linking for End-to-end Document Pair Extraction"
This repository contain the implementation of DANIEL. (A fast Document Attention Network for Information Extraction and Labeling of handwritten documents)
[Paper] Code for the EMNLP2023 (Findings) paper "Global Structure Knowledge-Guided Relation Extraction Method for Visually-Rich Document"
A curated list of resources on Document Layout Analysis
FastAPI application for document classification using a multimodal LayoutLM model, designed to classify PDF documents into RVL-DCIP categories.
(WIP) ✨ A comprehensive resource for understanding the world of software used in the Document Understanding field. 🧙✨
A hands-on CLI tool sample showcasing the integration of Dart with Google Cloud's DocumentAI.
OCR Runner - Command Line Application for processing image files using Google Cloud Vision API and Google Cloud Document AI.
Exploring LayoutLM for Smart OCR Capabilities
SnapDoc AI processes everything on-device, ensuring your sensitive information never leaves your control. Use voice and text on-device processing in organizations.
Spacy for Key:Value pairs
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
Smartloop is an open-source SLM platform to train and run models on an edge device
Financial QA assistant using RAG, FAISS, LangChain & HuggingFace to query 10-Ks & reports. PDF search, NLP, and Streamlit UI for insights.
This project uses OCR and a BART-based NLP pipeline to extract and summarize landlord, tenant, property, and contract details from scanned lease agreements. It combines Tesseract OCR, pdf2image, and HuggingFace Transformers to deliver structured legal summaries in JSON format.
A curated training dataset for fine-tuning large language models on U.S. affordable housing policy, finance, public housing, LIHTC, regulations, and voucher program administration. Designed for compliance automation, technical assistance, and intelligent document generation in pursuit of affordable housing development and preservation.