document-intelligence

There are 13 repositories under document-intelligence topic.

PaddleNLP
PaddlePaddle / PaddleNLP
Easy-to-use and powerful LLM and SLM library with awesome model zoo.
nlp embedding bert ernie paddlenlp pretrained-models transformers information-extraction question-answering search-engine semantic-analysis sentiment-analysis neural-search uie document-intelligence compression llm distributed-training llama
Language:Python 12839
Goldziher / kreuzberg
Document intelligence framework for Python - Extract text, metadata, and structured data from PDFs, images, Office documents, and more. Built on Pandoc, PDFium, and Tesseract.
ocr text-extraction async document-intelligence mcp metadata-extraction pandoc pdf-extraction pdfium python rag table-extraction tesseract
Language:HTML 2498
AlibabaResearch / AdvancedLiterateMachinery
A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.
artificial-intelligence documentai multimodal multimodal-deep-learning ocr computer-vision vision-language-transformer end-to-end-ocr scene-text-detection scene-text-detection-recognition scene-text-recognition text-detection text-recognition vision-language document document-analysis document-recognition document-understanding document-intelligence vision-language-model
Language:C++ 1796
contextgem
shcherbak-ai / contextgem
ContextGem: Effortless LLM extraction from documents
ai contract-analysis data-extraction document-intelligence generative-ai legaltech llm llm-extraction llm-framework llm-pipeline llms nlp prompt-engineering text-analysis unstructured-data docx docx2md docx2txt
Language:Python 1706
tstanislawek / awesome-document-understanding
A curated list of resources for Document Understanding (DU) topic
awesome-list machine-learning information-extraction key-information-extraction document-understanding robotic-process-automation document-analysis document-layout-analysis ocr natural-language-processing deep-learning nlp awesome pdf rpa pdf-documents document-intelligence unstructured-data intelligent-processing document-ai
1474
ExtractThinker
enoch3712 / ExtractThinker
ExtractThinker is a Document Intelligence library for LLMs, offering ORM-style interaction for flexible and powerful document workflows.
ai llm nlp ocr openai python document-image-analysis document-intelligence document-parsing document-processing langchain machine-learning pdf pdf-to-text
Language:Python 1455
Azure / AI-in-a-Box
AI-in-a-Box leverages the expertise of Microsoft across the globe to develop and provide AI and ML solutions to the technical community. Our intent is to present a curated collection of solution accelerators that can help engineers establish their AI/ML environments and solutions rapidly and with minimal friction.
ai azd azd-templates azure chat-bot chatbot chatgpt document-intelligence edge-ai edge-computing langchain machine-learning semantic-kernel custom-vision openai
Language:Jupyter Notebook 588
Azure-Samples / azure-ai-document-processing-samples
A collection of samples demonstrating techniques for processing documents with Azure AI including AI Foundry, OpenAI, Document Intelligence, etc.
ai azure classification document-intelligence extraction redaction translation embeddings gpt openai
Language:Bicep 113
doc-analysis / ReadingBank
ReadingBank: A Benchmark Dataset for Reading Order Detection
ocr nlp natural-language-processing document-understanding document-ai document-intelligence
113
Azure-Samples / doc-intelligence-in-a-box
The Doc Intelligence in-a-Box project leverages Azure AI Document Intelligence to extract data from PDF forms and store the data in a Azure Cosmos DB. This solution, part of the AI-in-a-Box framework by Microsoft Customer Engineers and Architects, ensures quality, efficiency, and rapid deployment of AI and ML solutions across various industries.
accelerator ai azd azd-templates azure cognitive-services document-intelligence form-analysis text-extraction
Language:Bicep 40
jamesmcroft / azure-document-intelligence-markdown-to-openai-data-extraction-sample
This sample demonstrates how to use Document Intelligence's Layout model to convert a PDF document, such as invoices, into Markdown, then use GPT-3.5 Turbo to extract structured JSON data using the Azure OpenAI Service.
azure document-intelligence gpt openai
Language:Jupyter Notebook 31
qyhou / curated-table-structure-recognition
A curated list of resources on Table Structure Recognition
document-ai document-intelligence table-recognition table-structure-recognition
28
ihdia / BoundaryNet
BoundaryNet - A Semi-Automatic Layout Annotation Tool
deep-learning document-intelligence graph-neural-networks interactive document-layout-analysis icdar2021 pytorch annotation
Language:Python 24
AI-Engineering-Study-Group / docugent
AI-powered document intelligence platform for automated analysis, processing, and insights extraction from various document formats.
ai automation data-extraction document-intelligence document-processing machine-learning nlp python
Language:Python 17
AmirhosseinHonardoust / Graph-RAG-Engine
An explainable AI system that combines Graph Intelligence, Vector Search, and Retrieval-Augmented Generation (RAG) to deliver grounded answers and transparent reasoning paths. Includes a FastAPI backend, Streamlit UI, FAISS vector index, and an in-memory knowledge graph for hybrid retrieval and recommendations.
document-intelligence explainable-ai faiss fastapi graph-ai graph-embeddings knowledge-graph machine-learning nlp python rag retrieval-augmented-generation semantic-search streamlit vector-search
Language:Python 13
jamesmcroft / document-intelligence-user-feedback-processor
An experiment to provide the capabilities of Azure AI Document Intelligence Studio template training for feedback loop
ai azure document-intelligence mlops
Language:Python 10
qyhou / curated-document-layout-analysis
A curated list of resources on Document Layout Analysis
document-ai document-intelligence document-layout-analysis layout-analysis document-structure-analysis page-object-detection document-hierarchy-extraction document-structure-extraction
9
joinalahmed / invoiceparsingwithAOAI
Using Azure Document Intelligence and Azure OpenAI services to automatically extract data from invoices.
aoai azure document-intelligence invoice-parser
Language:HTML 3
aihearticu / azure-ai-learning-hub
Comprehensive learning hub for Azure AI services - 130+ labs and tutorials covering AI-102 certification
ai ai-102 azure azure-openai certification cognitive-services computer-vision document-intelligence learning machine-learning nlp speech tutorials
Language:Python 1
BryanTheLai / StackRAG-Backend
StackRAG is a multi-tenant Retrieval-Augmented Generation (RAG) platform for financial document intelligence. It extracts structured data from financial PDFs using LLMs, offers secure multi-tenancy, real-time APIs, and is built on Python, FastAPI, Docker, and PostgreSQL.
business-analytics business-intelligence conversational-ai document-intelligence documents etl financial financial-analysis rag fastapi pdf-parsing postgresql python security supabase vector vector-database jinja2 prompt-management
Language:Jupyter Notebook 1
codedbyasim / Generative-AI-Document-Intelligence-System
Extract and summarise data from PDFs and images using OCR + LLMs. Built with Python, OpenCV, HuggingFace, and Flask.
ai artificial-intelligence document-intelligence generative-ai huggingface-transformers natural-language-processing pdf-processing python
Language:Python 1
fenilsonani / rag-document-qa
Enterprise-grade RAG system featuring dual online/offline operation, multi-modal document processing, and advanced AI capabilities including knowledge graph construction and hybrid search for intelligent document analysis.
chromadb document-intelligence enterprise-ai hybrid-search knowledge-graph langchain multi-modal-ai offline-ml retrieval-augmented-generation streamlit
Language:Python 1
FrancescoRomeo02 / multimodalragApp
Advanced multimodal RAG system for querying PDF documents with text, images, and tables using vector embeddings, semantic chunking, and LLMs via Groq API
ai chatbot computer-vision document-intelligence groq langchain machine-learning multimodal nlp pdf-analysis qdrant rag semantic-search streamlit
Language:Python 1
fri3erg / DataDig-AIExtractor
App used to extract structured data from documents photos or pdfs via custom templating and commercial LLM (GPT and Azure Document Intelligence). Developed as a Computer Science Thesis at University of Bologna
ai app azure document-intelligence extractor gpt kotlin structured-data
Language:Python 1
igorcervac / AzureAiSamples
Azure AI Samples
ai azure azureai speech-recognition speech-synthesis speech-to-text text-summarization text-to-speech ai-vision ai-translator document-intelligence form-recognizer ai-102 object-detection face-detection face-api
Language:C# 1
kanugurajesh / DocuMind
DocuMind is a document intelligence app where users can upload files, extract knowledge, and query them in natural language, combining semantic search (Qdrant), graph insights (Neo4j), and LLM reasoning.
aws-s3 cytoscapejs document-intelligence graph-rag mongodb neo4j nextjs15 openai qdrant tailwindcss text-embedding-3-small typescript
Language:TypeScript 1
ks6088ts-labs / azure-ai-services-solutions
A collection of solutions that leverage Azure AI services.
azure-functions fastapi openai poetry python streamlit typer document-intelligence azure azure-ai-services azure-event-grid azure-storage cosmosdb langchain langgraph
Language:Python 1
Md-Emon-Hasan / AutoDocThinker
Agentic AI system that allows users to upload documents (PDFs, DOCX, etc.) and natural language questions. It uses LLM-based RAG to extract relevant information. The architecture includes multi-agent components such as document retrievers, summarizers, web searchers, and tool routers — enabling dynamic reasoning and accurate responses.
agentic-ai document-intelligence langgraph llm-apps llm-reasoning rag semantic-search vector-search ai-document-search auto-document-analysis duckduckgo-tool smart-document-search tool-usage-llm ai-agents ai-assistant conversation-memory document-qa document-retrieval planner-executor-agent qna-system
Language:Jupyter Notebook 1
mmTheBest / AI-Agents-for-Business
A live, evolving collection of open-source AI agents and real examples showing how businesses can use AI to automate work, save time, and explore new ideas.
ai-agents ai-automation business-automation business-intelligence chatbot customer-support document-intelligence enterprise-search generative-ai hr-tech invoice-processing llm marketing-automation open-source rag sales-automation small-business text-to-sql voice-agent web-automation
1
MSUSAzureAccelerators / SouthReusableAssets
IP and use case assets for CSU
azure-ai-search azure-ai-services document-intelligence azure-ml openai-assistant-api openai-chat-api azure-openai python-code python-docs yaml-files json-files csharp-code csharp-docs semantic-kernel knowledge-graph mlflow prompt-flow
Language:Ruby 1
sshashi7 / AIRoadShow
Hands-on labs and mini hackathon to build a Sales Buddy Agent using Copilot Studio and Azure AI
azure-ai-search azureopenai copilot-studio document-intelligence powerautomate speech-to-text teams-integration
1
Sumanth1410-git / internal-docs-agent
Enterprise AI assistant for intelligent document Q&A via Slack - Advanced RAG system with multi-language support.
ai document-intelligence enterprise hackathon machine-learning nlp python rag slack-bot
Language:Python 1
willermo / markdown-for-llms
A comprehensive, production-ready Python pipeline for converting various document formats into clean, validated, and optimally chunked Markdown files ready for Large Language Model (LLM) consumption and NotebookLM notebooks.
document-intelligence documentation-tool llms markdown
Language:Python 1
JPatronC92 / contextaline
🔍 AI-powered document search with semantic understanding. Find files by content using Sentence-BERT. Modern PyQt6 GUI with keyboard shortcuts, search history, and context menu. Supports PDF, DOCX, TXT. 92% precision with AI.
desktop-application document-intelligence document-searching semantic-search sentence-transformers
Language:Python
khushiiagrawal / Advanced_Graph_RAG
This project answers natural-language questions over your Excel inventory and business PDFs using a hybrid RAG pipeline. It combines semantic embeddings (FAISS) with BM25 for exact IDs, extracts structured fields (e.g., totals, GST) from PDFs and builds an explainable relationship graph; results can be exported to Neo4j for graph exploration.
graph-rag neo4j nlp vector-database bm25 knowledge-graph llm sentence-transformers anthropic document-intelligence
Language:Python
MrSpecks / AI-Document-Intelligence-Platform-v1
The AI Document Intelligence Platform is an enterprise-oriented MVP that automates extraction, analysis, and summarization of business documents.
api-first-design compliance-analytics document-intelligence entity-extraction flexible-deployment multi-tenant rag-pipeline real-time-insights risk-analysis vector-search

document-intelligence

PaddlePaddle / PaddleNLP

Goldziher / kreuzberg

AlibabaResearch / AdvancedLiterateMachinery

shcherbak-ai / contextgem

tstanislawek / awesome-document-understanding

enoch3712 / ExtractThinker

Azure / AI-in-a-Box

Azure-Samples / azure-ai-document-processing-samples

doc-analysis / ReadingBank

Azure-Samples / doc-intelligence-in-a-box

jamesmcroft / azure-document-intelligence-markdown-to-openai-data-extraction-sample

qyhou / curated-table-structure-recognition

ihdia / BoundaryNet

AI-Engineering-Study-Group / docugent

AmirhosseinHonardoust / Graph-RAG-Engine

jamesmcroft / document-intelligence-user-feedback-processor

qyhou / curated-document-layout-analysis

joinalahmed / invoiceparsingwithAOAI

aihearticu / azure-ai-learning-hub

BryanTheLai / StackRAG-Backend

codedbyasim / Generative-AI-Document-Intelligence-System

fenilsonani / rag-document-qa

FrancescoRomeo02 / multimodalragApp

fri3erg / DataDig-AIExtractor

igorcervac / AzureAiSamples

kanugurajesh / DocuMind

ks6088ts-labs / azure-ai-services-solutions

Md-Emon-Hasan / AutoDocThinker

mmTheBest / AI-Agents-for-Business

MSUSAzureAccelerators / SouthReusableAssets

sshashi7 / AIRoadShow

Sumanth1410-git / internal-docs-agent

willermo / markdown-for-llms

JPatronC92 / contextaline

khushiiagrawal / Advanced_Graph_RAG

MrSpecks / AI-Document-Intelligence-Platform-v1