- Conversational Interface: Engage with the system using natural language queries to receive responses directly sourced from the PDFs.
- Direct Citation: Every response from the system includes a direct link to the source PDF page, ensuring traceability and verification.
- PDF Directory: A predefined set of key PDF documents, currently including WHO recommendations on major health topics such as schistosomiasis and malaria.
- π ChatGPT-3.5: Utilize this advanced iteration of the GPT model for engaging and human-like interactions, suitable for varied conversational tasks.
- π¦ Llama3-70B-8192: Experience high-end performance with this large-scale model, ideal for complex language tasks and deep learning insights.
- π¦ Llama3-8B-8192: Harness robust capabilities with this more accessible version of Llama3, perfect for a wide range of AI applications.
- π Mixtral-8x7B-32768: Leverage the power of ensemble modeling with Mixtral's extensive capacity for nuanced understanding and response generation.
- π¦ Llama2-70B-4096: Utilize the proven effectiveness of Llama2 for comprehensive language processing and application development.
- π Gemma-7B-IT: Explore specialized interactions and tech-focused solutions with Gemma, tailored for IT and technical content.
The application utilizes a combination of OpenAI embeddings, Pinecone vector search, and a conversational interface to provide a seamless retrieval experience. When a query is made, the system:
- Converts the query into embeddings.
- Searches for the most relevant document sections using Pinecone's vector search.
- Returns the answer along with citations and links to the source documents.
-
Clone the repository:
git clone https://github.com/yourusername/RAG-nificent.git
-
Install dependencies:
pip install -r requirements.txt
-
Set environment variables in a
.env
(also see.env.example
file:PINECONE_INDEX_NAME
PINECONE_NAME_SPACE
OPENAI_API_KEY
PINECONE_API_KEY
GROQ_API_KEY
-
Create a Pinecone index with the same name as
PINECONE_INDEX_NAME
. Set it up withdimensions=1536
andmetric=cosine
. -
Place your PDFs in the
pdf_data
directory and rundata_ingestion.py
-
Run the application:
chainlit run src/app.py
The system currently includes guidelines from the following PDFs with direct links to the documents: