bujo-eayn / agenticAI_pipeline

A modular multi-agent AI system that performs deep scientific research using a supervisor-worker architecture. It combines foundational and specialized language models to reason, plan, and execute tasks for document and chart analysis in scientific domains.

Repository from Github https://github.combujo-eayn/agenticAI_pipelineRepository from Github https://github.combujo-eayn/agenticAI_pipeline

Agentic Document Intelligence with GPT-4 & SmolDocling

This project is an agentic AI pipeline that uses GPT-4 as the primary agent and integrates SmolDocling as a tool to deeply analyze uploaded documents. It allows users to upload various document formats and extract structured content automatically, with an intelligent evaluation and feedback loop.


πŸš€ Features

  • Upload documents in multiple formats (PDF, Word, etc.)
  • Automatic conversion to PDF if needed
  • Dual extraction using GPT-4 and SmolDocling
  • Evaluation of extracted content using BLEU, overlap, and Jaccard similarity
  • Iterative feedback to SmolDocling to improve accuracy
  • Final structured output in Word and PDF formats
  • User prompt execution on final extracted document
  • Streamlit UI for ease of use
  • Dockerized for simple deployment

🧠 Pipeline Overview

  1. Upload Document: User uploads a file and provides a prompt.

  2. Preprocessing: The document is converted to PDF (if not already).

  3. GPT-4 Extraction: Extracts text, tables, images, and structural elements.

  4. SmolDocling Extraction: Sends static prompt to SmolDocling backend for extraction.

  5. Evaluation: Compares GPT-4 vs SmolDocling outputs using:

    • Textual overlap ratio
    • BLEU score
    • Jaccard similarity
  6. Consistency Check:

    • βœ… If consistent: build final doc and apply user prompt.
    • ❌ If inconsistent: identify differences and retry SmolDocling with feedback.
  7. Final Output: Assemble and export the cleaned, structured document.


πŸ—‚οΈ Folder Structure

.
β”œβ”€β”€ app.py                  # Streamlit UI
β”œβ”€β”€ Dockerfile              # Docker configuration
β”œβ”€β”€ requirements.txt
β”‚
β”œβ”€β”€ graph/                  # LangGraph logic
β”‚   β”œβ”€β”€ graph_builder.py
β”‚   └── nodes/              # Nodes in the pipeline
β”‚       β”œβ”€β”€ user_input.py
β”‚       β”œβ”€β”€ preprocess_doc.py
β”‚       β”œβ”€β”€ gpt_extract.py
β”‚       β”œβ”€β”€ smoldocling_call.py
β”‚       β”œβ”€β”€ evaluate.py
β”‚       β”œβ”€β”€ retry_node.py
β”‚       β”œβ”€β”€ final_output.py
β”‚       └── apply_prompt.py
β”‚
β”œβ”€β”€ tools/                  # Interfaces to external tools
β”‚   β”œβ”€β”€ gpt_tool.py
β”‚   β”œβ”€β”€ smoldocling_tool.py
β”‚   └── storage.py
β”‚
β”œβ”€β”€ utils/                  # Utility modules
β”‚   β”œβ”€β”€ evaluator.py
β”‚   β”œβ”€β”€ doc_utils.py
β”‚   └── logger.py
β”‚
└── uploads/                # Uploaded files storage

πŸ§ͺ Setup Instructions

🐳 Run with Docker

git clone https://github.com/yourname/agentic-doc-intel.git
cd agentic-doc-intel
docker build -t agentic-doc-intel .
docker run -p 8501:8501 agentic-doc-intel

🧰 Run Locally (Python β‰₯ 3.10)

git clone https://github.com/yourname/agentic-doc-intel.git
cd agentic-doc-intel
python -m venv venv
source venv/bin/activate  # or .\venv\Scripts\activate on Windows
pip install -r requirements.txt
streamlit run app.py

πŸ” Environment Variables

Create a .env or configure your environment:

OPENAI_API_KEY=your-key-here
SMOLDOCLING_URL=http://localhost:5001/api/extract

πŸ“¦ Module Responsibilities

  • graph/nodes/: Each node represents a modular stage in the LangGraph pipeline.
  • tools/gpt_tool.py: Manages interactions with GPT-4.
  • tools/smoldocling_tool.py: Manages API calls to SmolDocling backend.
  • utils/evaluator.py: Compares GPT and SmolDocling outputs.
  • utils/doc_utils.py: Converts and builds documents.
  • utils/logger.py: Sets up logging.
  • tools/storage.py: Saves uploaded files.

✨ Example Use Cases

  • Contract analysis
  • Invoice parsing
  • Research paper summarization
  • Regulatory document extraction

πŸ“¬ Contribute

PRs welcome. Open an issue to discuss changes or ideas.


πŸ“œ License

MIT

About

A modular multi-agent AI system that performs deep scientific research using a supervisor-worker architecture. It combines foundational and specialized language models to reason, plan, and execute tasks for document and chart analysis in scientific domains.


Languages

Language:Python 99.4%Language:Dockerfile 0.6%