geekcheng / RAGxplorer

Open-source tool to visualise your RAG documents ๐Ÿ”ฎ

Home Page:https://discuss.streamlit.io/t/ragxplorer-explore-your-rag-documents-gpt-4-chromadb-sentence-transformers/59371

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

RAGxplorer ๐Ÿฆ™๐Ÿฆบ

RAGxplorer is an interactive streamlit tool to support the building of Retrieval Augmented Generation (RAG) applications by visualizing document chunks and the queries in the embedding space.

Note

This app was for a Streamlit competition, and was very scrapily put together one evening.

I am re-factoring the code to be a package. You can view the progress in the experiment branch. Installation and usage details can be found here.

Until then, I appreciate your patience, and suggestions will be most welcomed here.

Demo ๐Ÿ”Ž

Streamlit App

โš ๏ธ Due to infra limitations, this freely hosted demo may occasionally go down. The best experience is to clone this repo, and run it locally.

Features โœจ

  • Document Upload: Users can upload PDF documents.
  • Chunk Configuration: Options to configure the chunk size and overlap
  • Choice of embedding model: all-MiniLM-L6-v2, text-embedding-3-small, text-embedding-3-large, text-embedding-ada-002
  • Vector Database Creation: Builds a vector database using Chroma
  • Query Expansion: Generates sub-questions and hypothetical answers to enhance the retrieval process.
  • Interactive Visualization: Utilizes Plotly to visualise the chunks.

Option 1 ๐Ÿ’ป

Installation โš™๏ธ

To run RAGxplorer, ensure you have Python installed, and then install the necessary dependencies:

pip install -r requirements-local-deployment.txt

Tip

โš ๏ธ Do not use requirements.txt. That is so the free streamlit deployment can run. That file includes an additional pysqlite3-binary dependency.

โš ๏ธ If it helps with troubleshooting, this application was built using Python 3.11

Usage ๐ŸŽ๏ธ

  1. Setup OPENAI_API_KEY (required) and ANYSCALE_API_KEY (if you need anyscale). Copy the .streamlit/secrets.example.toml file to .streamlit/secrets.toml and fill in the values.
  2. To start the application, run:
    streamlit run app.py

Note

This repo is currently linked to the streamlit demo, and these lines were added due to the runtime in the free streamlit deployment env. See here.

Option 2: Docker ๐Ÿณ

To run the project using Docker, run the following command:

docker-compose up -d

Once the image is built and the container is running, you can access the application at http://localhost:8501.

Using the experimental version ๐Ÿงช

Installation

Enter the following in your terminal

git clone -b experiment https://github.com/gabrielchua/RAGxplorer.git
cd RAGxplorer
virtualenv venv # create a new virtual env
source venv/bin/activate # activate the virtual env
pip install -r requirements.txt

Usage

from ragxplorer.ragxplorer import Explorer
client = Explorer(embedding_model="text-embedding-ada-002") # Please ensure "OPENAI_API_KEY" is set as an env variable
client.load_document("presentation.pdf")
client.visualise_query("What are the top revenue drivers for Microsoft?")

Contributing ๐Ÿ‘‹

Contributions to RAGxplorer are welcome. Please read our contributing guidelines (WIP) for details.

License ๐Ÿ‘€

This project is licensed under the MIT license - see the LICENSE file for details.

Acknowledgments ๐Ÿ’™

  • DeepLearning.AI and Chroma for the inspiration and code labs in their Advanced Retrival course.
  • The Streamlit community for the support and resources.

About

Open-source tool to visualise your RAG documents ๐Ÿ”ฎ

https://discuss.streamlit.io/t/ragxplorer-explore-your-rag-documents-gpt-4-chromadb-sentence-transformers/59371

License:MIT License


Languages

Language:Python 99.0%Language:Dockerfile 1.0%