document-ai

There are 35 repositories under document-ai topic.

microsoft / unilm
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
nlp pre-trained-model unilm minilm layoutlm layoutxlm beit document-ai trocr beit-3 foundation-models xlm-e deepnet llm multimodal mllm kosmos kosmos-1 textdiffuser bitnet
Language:Python 21818
clovaai / donut
Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator (SynthDoG), ECCV 2022
document-ai eccv-2022 multimodal-pre-trained-model ocr nlp computer-vision
Language:Python 6644
deepdoctection / deepdoctection
A Repo For Document AI
document-parser document-image-analysis table-recognition ocr document-ai document-understanding python document-layout-analysis table-detection pytorch tensorflow publaynet pubtabnet layoutlm nlp
Language:Python 2994
tstanislawek / awesome-document-understanding
A curated list of resources for Document Understanding (DU) topic
awesome-list machine-learning information-extraction key-information-extraction document-understanding robotic-process-automation document-analysis document-layout-analysis ocr natural-language-processing deep-learning nlp awesome pdf rpa pdf-documents document-intelligence unstructured-data intelligent-processing document-ai
1473
jpWang / LiLT
Official PyTorch implementation of LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding (ACL 2022)
nlp document-ai document-analysis document-understanding information-extraction multimodal-pre-trained-model multilingual-models
Language:Python 357
SCUT-DLVCLab / Document-AI-Recommendations
Algorithms, papers, datasets, performance comparisons for Document AI. Continuously updating.
document-ai document-understanding key-information-extraction visual-information-extraction table-structure-recognition
202
doc-analysis / ReadingBank
ReadingBank: A Benchmark Dataset for Reading Order Detection
ocr nlp natural-language-processing document-understanding document-ai document-intelligence
113
clovaai / webvicob
Official Implementation of Web-based Visual Corpus Builder (Webvicob), ICDAR 2023
document-ai icdar2023 nlp ocr
Language:Python 109
nttmdlab-nlp / SlideVQA
SlideVQA: A Dataset for Document Visual Question Answering on Multiple Images (AAAI2023)
aaai2023 computer-vision document-ai nlp ocr
Language:Python 100
ZeningLin / ViBERTgrid-PyTorch
An unofficial PyTorch implementation of "Lin et al. ViBERTgrid: A Jointly Trained Multi-Modal 2D Document Representation for Key Information Extraction from Documents. ICDAR, 2021"
key-information-extraction document-ai information-extraction document-analysis visual-information-extraction
Language:Python 53
whn09 / table_structure_recognition
Table detection (TD) and table structure recognition (TSR) using Yolov5/Yolov8, and you can get the same (even better) result compared with Table Transformer (TATR) with smaller models.
document-ai table-detection ocr table table-structure-recognition yolov5 yolov8
Language:Jupyter Notebook 51
DunnBC22 / Vision_Audio_and_Multimodal_Projects
This repository includes all computer vision, audio, document AI, and multimodal projects.
audio-classification computer-vision document-ai multimodal-deep-learning object-detection optical-character-recognition transfer-learning transformers
Language:Jupyter Notebook 47
googleapis / python-documentai-toolbox
Document AI Toolbox is an SDK for Python that provides utility functions for managing, manipulating, and extracting information from the document response. It creates a "wrapped" document object from JSON files in Cloud Storage, local JSON files, or output directly from the Document AI API.
ai document-ai gcp generative-ai google-cloud google-cloud-platform vertex-ai
Language:Python 46
nttmdlab-nlp / VDocRAG
[CVPR2025] VDocRAG: Retirval-Augmented Generation over Visually-Rich Documents
computer-vision cvpr2025 document-ai nlp ocr
Language:Python 46
ZeningLin / PEneo
[MM'2024] PEneo, an effective algorithm for key-value pair extraction from form-like documents, designed for real-world applications.
document-ai document-understanding key-information-extraction ocr visual-information-extraction
Language:Python 37
Unstructured-IO / community
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
community data-pipeline deep-learning document-ai document-parsing machine-learning nlp-parsing ocr-python open-source preprocessing-data
29
qyhou / curated-table-structure-recognition
A curated list of resources on Table Structure Recognition
document-ai document-intelligence table-recognition table-structure-recognition
28
SCUT-DLVCLab / RFUND
[MM'2024] Official release of RFUND introduced in the MM'2024 paper "PEneo: Unifying Line Extraction, Line Grouping, and Entity Linking for End-to-end Document Pair Extraction"
document-understanding visual-information-extraction document-ai key-information-extraction ocr
20
Shulk97 / daniel
This repository contain the implementation of DANIEL. (A fast Document Attention Network for Information Extraction and Labeling of handwritten documents)
computer-vision document-ai multimodal-pre-trained-model nlp ocr
Language:Python 18
chenxn2020 / GOSE
[Paper] Code for the EMNLP2023 (Findings) paper "Global Structure Knowledge-Guided Relation Extraction Method for Visually-Rich Document"
document-ai relation-extraction
Language:Python 17
NirmalNagaraj / DocGPT
A Chatbot for the Document Analysis .
ai chatbot document-ai
Language:Python 11
qyhou / curated-document-layout-analysis
A curated list of resources on Document Layout Analysis
document-ai document-intelligence document-layout-analysis layout-analysis document-structure-analysis page-object-detection document-hierarchy-extraction document-structure-extraction
9
conditionedstimulus / DocumentClassifier
FastAPI application for document classification using a multimodal LayoutLM model, designed to classify PDF documents into RVL-DCIP categories.
document-ai layoutlmv3 machine-learning nlp fastapi python
Language:Jupyter Notebook 8
dhorvay / document-understanding-ebook
(WIP) ✨ A comprehensive resource for understanding the world of software used in the Document Understanding field. 🧙✨
document-ai document-understanding awesome-document-understanding ebook ocr
Language:Markdown 6
bwnyasse / dart-documentai-samples
A hands-on CLI tool sample showcasing the integration of Dart with Google Cloud's DocumentAI.
dart dartlang document-ai document-understanding google-cloud machine-learning samples
Language:Dart 5
wintermi / ocr-runner
OCR Runner - Command Line Application for processing image files using Google Cloud Vision API and Google Cloud Document AI.
cloud-vision cloud-vision-api document-ai google-cloud google-cloud-platform
Language:Go 4
bhadreshpsavani / SmartOCR-with-LayoutLM
Exploring LayoutLM for Smart OCR Capabilities
document-ai document-inteligence layoutlm
3
devraftel / snapdoc-edge-ai
SnapDoc AI processes everything on-device, ensuring your sensitive information never leaves your control. Use voice and text on-device processing in organizations.
edgeai onnx-models onnx-torch onnxruntime privacy-as-code qualcomm document-ai enterprise-ai private-ai snapdragon
Language:Python 3
Purushothaman-natarajan / Custom-NER-Model-using-Spacy-Fine-Tuning
Spacy for Key:Value pairs
code document-ai machine-learning natural-language-processing ner neural-network nlp-keywords-extraction spacy
Language:Jupyter Notebook 3
ajaycode / unstructured
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
data-pipelines deep-learning document-ai document-image-analysis document-image-processing document-parsing docx donut information-retrieval langchain machine-learning ml natural-language-processing nlp ocr pdf pdf-to-json pdf-to-text preprocessing
Language:HTML 2
rag-fish / RAGfish
RAGfish The open-source standard for private, offline, multi-pack LLM RAG — unified RAGpack format, world-class pipeline, and reference macOS/iOS client. Your knowledge, your device, your rules.
ai apple-silicon document-ai embeddings ios knowledge-management llm machine-learning macos offline open-source p2p personal-ai pipeline privacy rag retrieval-augmented-generation vector-search ragpack
2
smartloop-ai / smartloop
Smartloop is an open-source SLM platform to train and run models on an edge device
ai llm llm-inference llama3 document-ai fine-tune-llms fine-tuning gemma llama3-2 llamacpp
Language:Python 2
anshi312 / financial-analyst-rag
Financial QA assistant using RAG, FAISS, LangChain & HuggingFace to query 10-Ks & reports. PDF search, NLP, and Streamlit UI for insights.
document-ai faiss financial-analysis huggingface langchain nlp pdf-processing rag semantic-search sentence-transformers streamlit
Language:Python 1
masfaatanveer / Lease-Summarization-Model-NLP
This project uses OCR and a BART-based NLP pipeline to extract and summarize landlord, tenant, property, and contract details from scanned lease agreements. It combines Tesseract OCR, pdf2image, and HuggingFace Transformers to deliver structured legal summaries in JSON format.
automation bart document-ai huggingface legaltech nlp ocr pdf-processing php python text-summarization transformers lease-summarization
Language:Python 1
rag-fish / LLM_Document_Agent
document-ai epub llm pdf rag streamlit chatbot knowledge-management privategpt semantic-search vector-search
Language:Python 1
zachurban / HousingMind
A curated training dataset for fine-tuning large language models on U.S. affordable housing policy, finance, public housing, LIHTC, regulations, and voucher program administration. Designed for compliance automation, technical assistance, and intelligent document generation in pursuit of affordable housing development and preservation.
affordable-housing document-ai govtech housing housing-affordability housing-data housing-policy llm-dataset public-housing semantic-search ai-in-government hud-compliance rad-conversion voucher-programs
Language:HTML 1

document-ai

microsoft / unilm

clovaai / donut

deepdoctection / deepdoctection

tstanislawek / awesome-document-understanding

jpWang / LiLT

SCUT-DLVCLab / Document-AI-Recommendations

doc-analysis / ReadingBank

clovaai / webvicob

nttmdlab-nlp / SlideVQA

ZeningLin / ViBERTgrid-PyTorch

whn09 / table_structure_recognition

DunnBC22 / Vision_Audio_and_Multimodal_Projects

googleapis / python-documentai-toolbox

nttmdlab-nlp / VDocRAG

ZeningLin / PEneo

Unstructured-IO / community

qyhou / curated-table-structure-recognition

SCUT-DLVCLab / RFUND

Shulk97 / daniel

chenxn2020 / GOSE

NirmalNagaraj / DocGPT

qyhou / curated-document-layout-analysis

conditionedstimulus / DocumentClassifier

dhorvay / document-understanding-ebook

bwnyasse / dart-documentai-samples

wintermi / ocr-runner

bhadreshpsavani / SmartOCR-with-LayoutLM

devraftel / snapdoc-edge-ai

Purushothaman-natarajan / Custom-NER-Model-using-Spacy-Fine-Tuning

ajaycode / unstructured

rag-fish / RAGfish

smartloop-ai / smartloop

anshi312 / financial-analyst-rag

masfaatanveer / Lease-Summarization-Model-NLP

rag-fish / LLM_Document_Agent

zachurban / HousingMind