A Python project for performing similarity search on sentences using the SentenceTransformer library and HNSW indexing.
This project demonstrates how to build a sentence similarity search system using the SentenceTransformer library, HNSW indexing, and a pre-trained transformer model. It allows users to find similar sentences or questions from a dataset based on a query input.
Key features of this project include:
- Sentence embedding using the SentenceTransformer model.
- Efficient similarity search using HNSW indexing.
- Loading and saving of the index to improve performance.
To run this project, you need the following:
- Python 3.9+
- Required Python packages (install via
pip install -r requirements.txt
):- sentence-transformers
- hnswlib
- tqdm
- jsonlines
-
Clone this repository:
git clone https://github.com/yourusername/sentence-similarity-search.git
-
Navigate to the project directory:
cd sentence-similarity-search
-
Install the required packages:
pip install -r requirements.txt
-
Download the SentenceTransformer model:
- You can change the model in the code by modifying the
MODEL_NAME
constant.
- You can change the model in the code by modifying the
-
Prepare your dataset in JSONL format. Each entry should have a 'prompt' and 'completion' field.
-
Run the main script:
python main.py
-
Enter a query, and the system will return the top similar questions from your dataset.
This project is licensed under the MIT License - see the LICENSE file for details.