VuBacktracking / bert-faiss-qa-system

Q&A System using BERT and Faiss Vector Database

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Q&A System using BERT and Faiss Vector Database


Table of Contents


Overview

This project is a Question & Answer system implemented using DistilBERT for text representation and Faiss (Facebook AI Similarity Search) for efficient similarity search in a vector database. The system is designed to provide accurate and relevant answers to user queries by searching through a large collection of documents.

workflow

Features

  • DistilBERT-based Text Representation: Utilizes the DistilBERT model to convert questions and documents into dense vector representations.

  • Faiss Vector Database: Stores the vector representations of the documents for fast similarity search.

  • Efficient Retrieval: Finds the most relevant documents to a given question by performing efficient similarity searches in the Faiss vector database.


Installation

Requirements

  • Python 3.x
  • PyTorch
  • Transformers
  • Faiss
  • Streamlit (for the web-based interface)

Setup

  1. Clone the repository:
git clone https://github.com/VuBacktracking/bert-faiss-qa-sytem.git
  1. Clone the repository:
pip install -r requirements.txt
  1. Train and Download the DistilBERT model:
python3 trainer.py

Note: You can check my model in the link: https://huggingface.co/vubacktracking/distilbert-base-uncased-finetuned-squad2

  1. Build the Faiss vector database:
python3 faiss_index.py

workflow


Usage

Streamlit Web App Interface

streamlit run app.py

Open your web browser and navigate to http://localhost:8501/ to use the web-based Q&A system.

How it Works

  1. BERT Embeddings:

    • The preprocessed text is converted into vector embeddings using the DistilBERT model.
  2. Faiss Indexing:

    • The DistilBERT embeddings of the documents are indexed in the Faiss vector database.
  3. Query Processing:

    • When a user inputs a question, the question is converted into a DistilBERT embedding.
    • Faiss is used to find the most similar embeddings (i.e., the most relevant documents) to the question embedding.
  4. Answer Extraction:

    • The relevant documents are ranked, and the most relevant answer passages are extracted and presented to the user.

Demo

Extractive Q&A

workflow

Closed Generative Q&A

workflow


Acknowledgments

About

Q&A System using BERT and Faiss Vector Database


Languages

Language:Python 100.0%