BookQuery

This project implements a Retrieval-Augmented Generation (RAG) system with support for long contexts like novels using RAPTOR.
It uses RAPTOR to create clusters of chunks and summarize them repeatedly until a single chunk of text is formed.

Project Files

RAPTOR.py: It contains the code for RAPTOR indexing; the clustering and summarization task is executed which may take some time. The base code is taken from the repo of the publisher of the paper: link and the langchain implementation of RAPTOR: link
index_generator.py: It uses the RAPTOR.py file and creates clusters and then save the summaries generated in a pandas dataframe along with the other chunks of text extracted from the book.
vectorstore_creator.py: It creates the vector-store based on the summaries and text chunks generated from the index_generator.py file.
application.py: It runs a streamlit application which works like a chatbot to interact with the book.

Steps to Execute:-

Download the necessary libraries using pip.

pip install requirements.txt

Create a .env file to store your API keys(here I used the Huggingface API key).
Execute the index_generator.py file which will ask for the pdf file's path and the book's name.

python index_generator.py
Enter the file path of the book: <Enter the path of your pdf>
Enter the name of the book: <Enter the name of your book>

Now execute the application.py file which will again ask for the pdf file's path and the book's name.

python application.py
Enter the file path of the book: <Enter the path of your pdf>
Enter the name of the book: <Enter the name of your book>

It may take some time to load initially but once it loads everything it will launch a streamlit application onto your browser and then you can ask your queries.

Cons:

It is currently programmed for only one book but can be scaled easily.
During the running of index_generator.py file the token limit of the API key might exceed if the book is longer. In that case, more than one API key can be used.
At the moment it creates the vector database locally but can be hosted on the cloud.

Working Sample- link

Anirban2205 / BookQuery

BookQuery

Project Files

Steps to Execute:-

Cons:

About

Languages