Chatbot leveraging RAG for Financial Document (10Q, 10K) Summarization

Check other similar LLM projects at https://github.com/kHarshit/llm-projects.

Chatbot leveraging Retrieval Augmented Generation (RAG) for Financial Document Summarization

Technical Problem Formulation

Problem statement: Given a PDF document and a query, retrieve the relevant details and information from the document as per the query, and synthesize this information to generate accurate answers.
Data Ingestion and Processing: Reading PDFs of financial reports and split the documents for efficient text chunking of long documents.
Retrieval-Augmented Generation (RAG): Combination of document retrieval with the generative capabilities of the chosen language models.
Large Language Models: Evaluation of various models, including GPT-3.5-turbo, LLama 2, Gemma 1.1, etc.
Conversation Chain and Prompt Design: Crafting of a prompt template designed for concise two-sentence financial summaries.
User interface: Designing Chatbot like user interface.

Different answers generated by GPT4 (w/o RAG).

GPT-4 gives different results when asked the same question multiple times showing hallucinations. This behavior is not observed in our RAG system.

Modified from blog.goopenai

Metrics:

Description:

Faithfulness: This measures the factual consistency of the generated answer against the given context. The generated answer is regarded as faithful if all the claims that are made in the answer can be inferred from the given context.
Answer relevancy: Scores the relevancy of the answer according to the given question. Answers with incomplete, redundant or unnecessary information is penalized.
Context recall: measures the extent to which the retrieved context aligns with the annotated answer, treated as the ground truth.
Context precision: evaluates whether all of the ground-truth relevant items present in the contexts are ranked higher or not.

pip install -r requirements.txt

python index.py

Notebooks referenced from Kaggle

Contributor 1	Contributor 2	Contributor 3
Harshit Kumar	Sarthak Khandelwal	Alexander Leon

Chatbot leveraging RAG for Financial Document (10Q, 10K) Summarization

Language:Jupyter Notebook 87.7%Language:Python 12.3%Language:CSS 0.0%