Sentence Similarity

This repo contains various ways to calculate the similarity between source and target sentences. You can use the pre-trained models you want to use such as ELMo, BERT and Universal Sentence Encoder (USE).

And you can also choose the method to be used to get the similarity:

1. Cosine similarity
2. Euclidean distance
3. Inner product
4. TS-SS score
5. Pairwise-cosine similarity
6. Pairwise-cosine similarity + IDF

Installation

After cloning this repository, you can simply download all the dependent libraries described in requirements.txt with pip install -r requirements.txt.

git clone https://github.com/Huffon/sentence-similarity.git
cd sentence-similarity
pip install -r requirements.txt

Usage

You have to choose the model and method to be used to calculate the similarity between source and target sentences.
You should wrap your source and target sentences with double quotations(").

python sensim.py
    --source "SOURCE_SENTENCE"
    --target "TARGET_SENTENCE"
    --model  MODEL_NAME
    --method METHOD_NAME

Examples

In the following section, you can see various use-case of sentence-similarity.
As you guys know, there is a no silver-bullet which can calculate perfect similarity between sentences. You should conduct various experiments with your dataset.
- Caution: TS-SS score might not fit with short-sentence similarity task, since this method originally devised to calculate the similarity between documents.

> python sensim.py --model use --method cosine
> Similarity using [use] with. [cosine] between
           source> "I ate an apple"
           target> "I went to the Apple" is |  0.76871


> python sensim.py --model elmo --method ts-ss
> Similarity using [elmo] with. [ts-ss] between
           source> "I ate an apple"
           target> "I went to the Apple" is |  32.35986


> python sensim.py --model bert --method pairwise
> Similarity using [bert] with. [pairwise] between
           source> "I ate an apple"
           target> "I went to the Apple" is |  0.91523

Requirements

allennlp==0.9.0
bert-score==0.2.1
numpy==1.17.3
sentence-transformers==0.2.3
spacy==2.1.9
tensorflow==1.15.0
tensorflow-hub==0.7.0
torch==1.3.0

References

Papers

Libraries

About

This repository contains various ways to calculate sentence vector similarity using NLP models

Languages

Language:Python 100.0%