dipteshkanojia / sentence-similarity

This repository contains various ways to calculate sentence vector similarity using NLP models

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Sentence Similarity

This repo contains various ways to calculate the similarity between source and target sentences. You can use the pre-trained models you want to use such as ELMo, BERT and Universal Sentence Encoder (USE).

And you can also choose the method to be used to get the similarity:

1. Cosine similarity
2. Euclidean distance
3. Inner product
4. TS-SS score
5. Pairwise-cosine similarity
6. Pairwise-cosine similarity + IDF

Installation

  • After cloning this repository, you can simply download all the dependent libraries described in requirements.txt with pip install -r requirements.txt.
git clone https://github.com/Huffon/sentence-similarity.git
cd sentence-similarity
pip install -r requirements.txt

Usage

  • You have to choose the model and method to be used to calculate the similarity between source and target sentences.
  • You should wrap your source and target sentences with double quotations(").
python sensim.py
    --source "SOURCE_SENTENCE"
    --target "TARGET_SENTENCE"
    --model  MODEL_NAME
    --method METHOD_NAME

Examples

  • In the following section, you can see various use-case of sentence-similarity.
  • As you guys know, there is a no silver-bullet which can calculate perfect similarity between sentences. You should conduct various experiments with your dataset.
    • Caution: TS-SS score might not fit with short-sentence similarity task, since this method originally devised to calculate the similarity between documents.
> python sensim.py --model use --method cosine
> Similarity using [use] with. [cosine] between
           source> "I ate an apple"
           target> "I went to the Apple" is |  0.76871


> python sensim.py --model elmo --method ts-ss
> Similarity using [elmo] with. [ts-ss] between
           source> "I ate an apple"
           target> "I went to the Apple" is |  32.35986


> python sensim.py --model bert --method pairwise
> Similarity using [bert] with. [pairwise] between
           source> "I ate an apple"
           target> "I went to the Apple" is |  0.91523

Requirements

allennlp==0.9.0
bert-score==0.2.1
numpy==1.17.3
sentence-transformers==0.2.3
spacy==2.1.9
tensorflow==1.15.0
tensorflow-hub==0.7.0
torch==1.3.0

References

Papers

Libraries

About

This repository contains various ways to calculate sentence vector similarity using NLP models


Languages

Language:Python 100.0%