bert faiss faiss-vector-database qa-system vector-database distilbert

Q&A System using BERT and Faiss Vector Database

Overview

This project is a Question & Answer system implemented using DistilBERT for text representation and Faiss (Facebook AI Similarity Search) for efficient similarity search in a vector database. The system is designed to provide accurate and relevant answers to user queries by searching through a large collection of documents.

Features

DistilBERT-based Text Representation: Utilizes the DistilBERT model to convert questions and documents into dense vector representations.
Faiss Vector Database: Stores the vector representations of the documents for fast similarity search.
Efficient Retrieval: Finds the most relevant documents to a given question by performing efficient similarity searches in the Faiss vector database.

Installation

Requirements

Python 3.x
PyTorch
Transformers
Faiss
Streamlit (for the web-based interface)

Setup

Clone the repository:

git clone https://github.com/VuBacktracking/bert-faiss-qa-sytem.git

Clone the repository:

pip install -r requirements.txt

Train and Download the DistilBERT model:

python3 trainer.py

Note: You can check my model in the link: https://huggingface.co/vubacktracking/distilbert-base-uncased-finetuned-squad2

Build the Faiss vector database:

python3 faiss_index.py

Usage

Streamlit Web App Interface

streamlit run app.py

Open your web browser and navigate to http://localhost:8501/ to use the web-based Q&A system.

How it Works

BERT Embeddings:
- The preprocessed text is converted into vector embeddings using the DistilBERT model.
Faiss Indexing:
- The DistilBERT embeddings of the documents are indexed in the Faiss vector database.
Query Processing:
- When a user inputs a question, the question is converted into a DistilBERT embedding.
- Faiss is used to find the most similar embeddings (i.e., the most relevant documents) to the question embedding.
Answer Extraction:
- The relevant documents are ranked, and the most relevant answer passages are extracted and presented to the user.

VuBacktracking / bert-faiss-qa-system

Q&A System using BERT and Faiss Vector Database

Table of Contents

Overview

Features

Installation

Requirements

Setup

Usage

Streamlit Web App Interface

How it Works

Demo

Extractive Q&A

Closed Generative Q&A

Acknowledgments

About

Languages