There are 22 repositories under document-understanding topic.
RAGFlow is a leading open-source Retrieval-Augmented Generation (RAG) engine that fuses cutting-edge RAG with Agent capabilities to create a superior context layer for LLMs
A Repo For Document AI
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding
A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.
A curated list of resources for Document Understanding (DU) topic
Code for the paper "PICK: Processing Key Information Extraction from Documents using Improved Graph Learning-Convolutional Networks" (ICPR 2020)
Official PyTorch implementation of LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding (ACL 2022)
Sample applications and demos for Document AI, the end-to-end document processing platform on Google Cloud
Algorithms, papers, datasets, performance comparisons for Document AI. Continuously updating.
A Curated List of Awesome Table Structure Recognition (TSR) Research. Including models, papers, datasets and codes. Continuously updating.
Minimal sharded dataset loaders, decoders, and utils for multi-modal document, image, and text datasets.
DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Models
Doc2Graph transforms documents into graphs and exploit a GNN to solve several tasks.
ReadingBank: A Benchmark Dataset for Reading Order Detection
Object Detection Model for Scanned Documents
Checkbox Detection Model for Scanned Documents
[MM'2024] PEneo, an effective algorithm for key-value pair extraction from form-like documents, designed for real-world applications.
TAT-DQA: Towards Complex Document Understanding By Discrete Reasoning
[MM'2024] Official release of RFUND introduced in the MM'2024 paper "PEneo: Unifying Line Extraction, Line Grouping, and Entity Linking for End-to-end Document Pair Extraction"
Implementation of the paper: Going Full-TILT Boogie on Document Understanding with Text-Image-Layout Transformer.
Run optical character recognition with PyTesseract from the FiftyOne App!
(WIP) ✨ A comprehensive resource for understanding the world of software used in the Document Understanding field. 🧙✨
This small module connects Label Studio with Fonduer by creating a fonduer labeling function for gold labels from a label studio export. Documentation: https://irgroup.github.io/labelstudio-to-fonduer/
This project tackles a real-world challenge of automating client document processing, with a focus on enhancing document classification, error detection, data extraction, and validation.
A hands-on CLI tool sample showcasing the integration of Dart with Google Cloud's DocumentAI.
QuickCapture Mobile Scanning SDK Specially designed for native IOS
QuickCapture Mobile Scanning SDK Specially designed for native ANDROID from Extrieve
Analysing expense reports/invoices with AWS Textract and boto3.
This repository includes the ReceiptVQA dataset and the Pytorch implementation of the LiGT method and other evaluated baselines.
This project automates the processing of Invoices using the Dispatcher - Performer model in UiPath and Document Understanding. The process involves asking the user for a date, navigating to a website to upload invoices where the Due Date matches a given condition, and adding each invoice as a queue item in the Orchestrator.