Name | StudentID | Contribution % |
---|---|---|
JOSHI NIRANJAN SURYAKANT | 2023AC05011 | 100% |
PRATEEK RALHAN | 2023AC05673 | 100% |
KESHARKAR SURAJ SANJAY | 2023AD05004 | 100% |
SAURABH SUNIT JOTSHI | 2023AC05565 | 100% |
KILLI SATYA PRAKASH | 2023AC05066 | 100% |
A comprehensive comparative analysis system that implements and evaluates two approaches for answering questions based on company financial statements:
- Retrieval-Augmented Generation (RAG) Chatbot: Combines document retrieval and generative response
- Fine-Tuned Language Model (FT) Chatbot: Directly fine-tunes a small open-source language model on financial Q&A
๐ ๐ฌ Live WebApp๐
๐ ๐ Architecture Summary Document
Develop and compare two systems for answering questions based on company financial statements (last two years) using the same financial data for both methods and perform a detailed comparison on accuracy, speed, and robustness.
- Hybrid Retrieval: Combines dense (vector) and sparse (BM25) retrieval methods
- Memory-Augmented Retrieval: Persistent memory bank for frequently asked questions
- Advanced Guardrails: Input and output validation systems
- Multi-source Retrieval: FAISS vector database + ChromaDB integration
- Document Chunking: Intelligent text segmentation with configurable chunk sizes
- Continual Learning: Incremental fine-tuning without catastrophic forgetting
- Domain Adaptation: Specialized for financial Q&A domain
- Efficient Training: Optimized hyperparameters for small models
- Confidence Scoring: Built-in confidence estimation
- Model Persistence: Save and load fine-tuned models
- Comprehensive Metrics: Accuracy, response time, confidence, factuality
- Visualization: Interactive charts and performance comparisons
- Test Suite: Diverse question types (relevant high/low confidence, irrelevant)
- ROUGE Scoring: Text similarity metrics for quality assessment
- Streamlit Web App: Modern, responsive interface
- Real-time Comparison: Side-by-side RAG vs Fine-tuned results
- Interactive QA: Ask questions and get instant responses
- Performance Dashboard: Live metrics and visualizations
Financial QA System
โโโ Data Processing
โ โโโ PDF Extraction (pdfplumber, PyPDF2)
โ โโโ Text Cleaning & Segmentation
โ โโโ Q&A Pair Generation
โ โโโ Chunking for RAG
โโโ RAG System
โ โโโ Hybrid Retrieval (FAISS + BM25)
โ โโโ Memory-Augmented Retrieval
โ โโโ Response Generation (DistilGPT2)
โ โโโ Guardrails (Input/Output)
โโโ Fine-Tuned System
โ โโโ Continual Learning
โ โโโ Domain Adaptation
โ โโโ Model Training & Persistence
โ โโโ Confidence Estimation
โโโ Evaluation System
โ โโโ Performance Metrics
โ โโโ Comparative Analysis
โ โโโ Visualization Generation
โ โโโ Results Export
โโโ User Interface
โโโ Streamlit Web App
โโโ Interactive QA
โโโ System Comparison
โโโ Performance Dashboard
- Python 3.8+
- CUDA-compatible GPU (optional, for faster training)
git clone <repository-url>
cd financial-qa-system
python -m venv env
source env/bin/activate # On Windows: env\Scripts\activate
pip install -r requirements.txt
The system will automatically download required models on first run:
all-MiniLM-L6-v2
(sentence embeddings)distilgpt2
(generation model)distilbert-base-uncased
(classification)
python main.py data
python main.py rag
python main.py fine-tune
python main.py evaluate
python main.py interface
python main.py all
-
Start the interface:
python main.py interface
-
Open your browser and navigate to the displayed URL
-
Select your preferred system:
- RAG System
- Fine-tuned System
- Both (Comparison)
-
Ask questions and view results in real-time
- Adaptability: Easy to update with new documents
- Factual Grounding: Direct access to source documents
- Transparency: Clear source attribution
- Flexibility: Handles diverse question types
- Speed: Faster inference after training
- Fluency: More natural, coherent responses
- Efficiency: Lower computational overhead
- Specialization: Domain-specific knowledge
- RAG: Higher accuracy, slower response, more resource-intensive
- Fine-tuned: Lower accuracy, faster response, more efficient
@dataclass
class TrainingConfig:
model_name: str = "distilgpt2"
learning_rate: float = 5e-5
batch_size: int = 4
num_epochs: int = 3
max_length: int = 512
warmup_steps: int = 100
weight_decay: float = 0.01
- Chunk Size: Configurable text segmentation (100-400 tokens)
- Top-K Retrieval: Number of chunks to retrieve (default: 5)
- Dense Weight: Weight for vector similarity vs BM25 (default: 0.7)
- Accuracy: Correct answer rate
- Response Time: Average inference speed
- Confidence: Model confidence scores
- Factuality: Response reliability assessment
- ROUGE Scores: Text similarity metrics
- Source Attribution: Document source tracking
- Validation Status: Input/output guardrail results
financial-qa-system/
โโโ src/
โ โโโ __init__.py
โ โโโ data_processor.py # Document processing & Q&A generation
โ โโโ rag_system.py # RAG implementation
โ โโโ fine_tune_system.py # Fine-tuning implementation
โ โโโ evaluation_system.py # Evaluation & comparison
โ โโโ interface.py # Streamlit web interface
โโโ financial_statements/ # Input PDF documents
โโโ processed_data/ # Processed texts & Q&A pairs
โโโ evaluation_results/ # Evaluation outputs & visualizations
โโโ main.py # Main execution script
โโโ requirements.txt # Python dependencies
โโโ README.md # This file
- Relevant, High-Confidence: Clear facts in financial data
- Relevant, Low-Confidence: Ambiguous or sparse information
- Irrelevant: Questions outside financial scope
- "What was the company's revenue in 2024?"
- "What are the total assets?"
- "What type of company is this?"
- "What is the capital of France?" (irrelevant)
- Relevance Check: Validates financial/company-related queries
- Harmful Content: Filters potentially dangerous inputs
- Query Validation: Ensures proper question format
- Factuality Check: Detects hallucinated responses
- Confidence Threshold: Flags low-confidence outputs
- Contradiction Detection: Identifies conflicting statements
- Persistent memory bank for frequent Q&A patterns
- Automatic categorization and retrieval
- Confidence-based response selection
- Incremental fine-tuning on new data
- Catastrophic forgetting prevention
- Domain adaptation capabilities
- Dense retrieval (sentence embeddings)
- Sparse retrieval (BM25)
- Weighted score fusion
Question | Method | Answer | Confidence | Time (s) | Correct (Y/N) |
---|---|---|---|---|---|
Revenue in 2024? | RAG | $391.0B | 0.93 | 9.11 | Y |
Revenue in 2024? | Fine-Tune | $391.0B | 0.91 | 21.23 | Y |
Total sales(iphones)? | RAG | $182.2B | 0.89 | 4.22 | N |
Total sales(iphones)? | Fine-Tune | $201.2B | 0.92 | 44.12 | Y |
Capital of France? | RAG | blank response | 0.35 | 11.2 | Y |
Capital of France? | Fine-Tune | Paris | 0.22 | 3.47 | N |
- Hugging Face: Transformers library and model hub
- Sentence Transformers: Embedding models
- FAISS: Vector similarity search
- Streamlit: Web interface framework
- Apple Inc.: Financial statement data for testing
-
Multi-modal Support: Image and table extraction from PDFs
-
Real-time Updates: Live document ingestion and processing
-
Advanced Guardrails: More sophisticated validation systems
-
Model Compression: Quantization and distillation for efficiency
-
API Integration: RESTful API for external applications
-
Multi-language Support: Internationalization capabilities