RAGxplorer is an interactive streamlit tool to support the building of Retrieval Augmented Generation (RAG) applications by visualizing document chunks and the queries in the embedding space.
Note
This app was for a Streamlit competition, and was very scrapily put together one evening.
I am re-factoring the code to be a package. You can view the progress in the experiment
branch. Installation and usage details can be found here.
Until then, I appreciate your patience, and suggestions will be most welcomed here.
- Document Upload: Users can upload PDF documents.
- Chunk Configuration: Options to configure the chunk size and overlap
- Choice of embedding model:
all-MiniLM-L6-v2
,text-embedding-3-small
,text-embedding-3-large
,text-embedding-ada-002
- Vector Database Creation: Builds a vector database using Chroma
- Query Expansion: Generates sub-questions and hypothetical answers to enhance the retrieval process.
- Interactive Visualization: Utilizes Plotly to visualise the chunks.
To run RAGxplorer, ensure you have Python installed, and then install the necessary dependencies:
pip install -r requirements-local-deployment.txt
Tip
requirements.txt
. That is so the free streamlit deployment can run. That file includes an additional pysqlite3-binary
dependency.
- Setup
OPENAI_API_KEY
(required) andANYSCALE_API_KEY
(if you need anyscale). Copy the.streamlit/secrets.example.toml
file to.streamlit/secrets.toml
and fill in the values. - To start the application, run:
streamlit run app.py
Note
This repo is currently linked to the streamlit demo, and these lines were added due to the runtime in the free streamlit deployment env. See here.
To run the project using Docker, run the following command:
docker-compose up -d
Once the image is built and the container is running, you can access the application at http://localhost:8501
.
Enter the following in your terminal
git clone -b experiment https://github.com/gabrielchua/RAGxplorer.git
cd RAGxplorer
virtualenv venv # create a new virtual env
source venv/bin/activate # activate the virtual env
pip install -r requirements.txt
from ragxplorer.ragxplorer import Explorer
client = Explorer(embedding_model="text-embedding-ada-002") # Please ensure "OPENAI_API_KEY" is set as an env variable
client.load_document("presentation.pdf")
client.visualise_query("What are the top revenue drivers for Microsoft?")
Contributions to RAGxplorer are welcome. Please read our contributing guidelines (WIP) for details.
This project is licensed under the MIT license - see the LICENSE file for details.
- DeepLearning.AI and Chroma for the inspiration and code labs in their Advanced Retrival course.
- The Streamlit community for the support and resources.