This project is an agentic AI pipeline that uses GPT-4 as the primary agent and integrates SmolDocling as a tool to deeply analyze uploaded documents. It allows users to upload various document formats and extract structured content automatically, with an intelligent evaluation and feedback loop.
- Upload documents in multiple formats (PDF, Word, etc.)
- Automatic conversion to PDF if needed
- Dual extraction using GPT-4 and SmolDocling
- Evaluation of extracted content using BLEU, overlap, and Jaccard similarity
- Iterative feedback to SmolDocling to improve accuracy
- Final structured output in Word and PDF formats
- User prompt execution on final extracted document
- Streamlit UI for ease of use
- Dockerized for simple deployment
-
Upload Document: User uploads a file and provides a prompt.
-
Preprocessing: The document is converted to PDF (if not already).
-
GPT-4 Extraction: Extracts text, tables, images, and structural elements.
-
SmolDocling Extraction: Sends static prompt to SmolDocling backend for extraction.
-
Evaluation: Compares GPT-4 vs SmolDocling outputs using:
- Textual overlap ratio
- BLEU score
- Jaccard similarity
-
Consistency Check:
- β If consistent: build final doc and apply user prompt.
- β If inconsistent: identify differences and retry SmolDocling with feedback.
-
Final Output: Assemble and export the cleaned, structured document.
.
βββ app.py # Streamlit UI
βββ Dockerfile # Docker configuration
βββ requirements.txt
β
βββ graph/ # LangGraph logic
β βββ graph_builder.py
β βββ nodes/ # Nodes in the pipeline
β βββ user_input.py
β βββ preprocess_doc.py
β βββ gpt_extract.py
β βββ smoldocling_call.py
β βββ evaluate.py
β βββ retry_node.py
β βββ final_output.py
β βββ apply_prompt.py
β
βββ tools/ # Interfaces to external tools
β βββ gpt_tool.py
β βββ smoldocling_tool.py
β βββ storage.py
β
βββ utils/ # Utility modules
β βββ evaluator.py
β βββ doc_utils.py
β βββ logger.py
β
βββ uploads/ # Uploaded files storage
git clone https://github.com/yourname/agentic-doc-intel.git
cd agentic-doc-intel
docker build -t agentic-doc-intel .
docker run -p 8501:8501 agentic-doc-intel
git clone https://github.com/yourname/agentic-doc-intel.git
cd agentic-doc-intel
python -m venv venv
source venv/bin/activate # or .\venv\Scripts\activate on Windows
pip install -r requirements.txt
streamlit run app.py
Create a .env
or configure your environment:
OPENAI_API_KEY=your-key-here
SMOLDOCLING_URL=http://localhost:5001/api/extract
- graph/nodes/: Each node represents a modular stage in the LangGraph pipeline.
- tools/gpt_tool.py: Manages interactions with GPT-4.
- tools/smoldocling_tool.py: Manages API calls to SmolDocling backend.
- utils/evaluator.py: Compares GPT and SmolDocling outputs.
- utils/doc_utils.py: Converts and builds documents.
- utils/logger.py: Sets up logging.
- tools/storage.py: Saves uploaded files.
- Contract analysis
- Invoice parsing
- Research paper summarization
- Regulatory document extraction
PRs welcome. Open an issue to discuss changes or ideas.
MIT