denji / Local-RAG-with-Ollama

A local RAG + Chat Application.

Repository from Github https://github.comdenji/Local-RAG-with-OllamaRepository from Github https://github.comdenji/Local-RAG-with-Ollama

Understanding RAG

Local RAG Chat App with Google Gemma 3, LangChain, and Reflex

A Retrieval-Augmented Generation (RAG) chatbot application built with Reflex, LangChain, and Ollama's Gemma model. This application allows users to ask questions and receive answers enhanced with context retrieved from a dataset.

This project contains code samples for the blog post here.

Blogs

  1. Understanding RAG
  2. Building a Local RAG Chatbot with Ollama

RAG Chat with Gemma

Features

  • πŸ’¬ Modern chat-like interface for asking questions
  • πŸ” Retrieval-Augmented Generation for more accurate answers
  • 🧠 Uses Gemma 3 4B-IT model via Ollama
  • πŸ“š Built with the neural-bridge/rag-dataset-12000 dataset
  • πŸ› οΈ FAISS vector database for efficient similarity search
  • πŸ”„ Full integration with LangChain for RAG pipeline
  • 🌐 Built with Reflex for a reactive web interface

Prerequisites

  • Python 3.12+
  • Ollama installed and running
  • The Gemma 3 4B model pulled in Ollama: ollama pull gemma3:4b-it-qat

Installation

  1. Clone the repository:

    git clone https://github.com/srbhr/Local-RAG-with-Ollama.git
    cd RAG_Blog
  2. Install dependencies:

    pip install -r requirements.txt
  3. Make sure Ollama is running and you've pulled the required model:

    ollama pull gemma3:4b-it-qat

Usage

  1. Start the Reflex development server:

    reflex run
  2. Open your browser and go to http://localhost:3000

RAG Chat with Gemma

  1. Start asking questions in the chat interface!

Project Structure

RAG_Blog/
β”œβ”€β”€ assets/                    # Static assets 
β”œβ”€β”€ faiss_index_neural_bridge/ # FAISS vector database for the full dataset
β”‚   β”œβ”€β”€ index.faiss            # FAISS index file
β”‚   └── index.pkl              # Pickle file with metadata
β”œβ”€β”€ faiss_index_subset/        # FAISS vector database for a subset of data
β”œβ”€β”€ rag_gemma_reflex/          # Main application package
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ rag_gemma_reflex.py    # UI components and styling
β”‚   β”œβ”€β”€ rag_logic.py           # Core RAG implementation
β”‚   └── state.py               # Application state management
β”œβ”€β”€ requirements.txt           # Project dependencies
└── rxconfig.py                # Reflex configuration

How It Works

RAG Chat with Gemma

This application implements a RAG (Retrieval-Augmented Generation) architecture:

  1. Embedding and Indexing: Documents from the neural-bridge/rag-dataset-12000 dataset are embedded using HuggingFace's all-MiniLM-L6-v2 model and stored in a FAISS vector database.

  2. Retrieval: When a user asks a question, the application converts the question into an embedding and finds the most similar documents in the FAISS index.

  3. Generation: The retrieved documents are sent to the Gemma 3 model (running via Ollama) along with the user's question to generate a contextualized response.

  4. UI: The Reflex framework provides a reactive web interface for the chat application.

Customization

You can customize the following aspects of the application:

  • LLM Model: Change the OLLAMA_MODEL environment variable or modify the DEFAULT_OLLAMA_MODEL in rag_logic.py
  • Dataset: Modify the DATASET_NAME in rag_logic.py
  • Embedding Model: Change the EMBEDDING_MODEL_NAME in rag_logic.py
  • UI Styling: Modify the styles in rag_gemma_reflex.py

Environment Variables

  • OLLAMA_MODEL: Override the default Gemma model
  • OLLAMA_HOST: Specify a custom Ollama host (default: http://localhost:11434)

License

This project is licensed under the MIT License. See the LICENSE file for details.

Acknowledgements

About

A local RAG + Chat Application.

License:MIT License


Languages

Language:Python 100.0%