This repo contains various ways to calculate the similarity between source and target sentences. You can use the pre-trained models you want to use such as ELMo, BERT and Universal Sentence Encoder (USE).
And you can also choose the method to be used to get the similarity:
1. Cosine similarity
2. Euclidean distance
3. Inner product
4. TS-SS score
5. Pairwise-cosine similarity
6. Pairwise-cosine similarity + IDF
- After cloning this repository, you can simply download all the dependent libraries described in
requirements.txt
withpip install -r requirements.txt
.
git clone https://github.com/Huffon/sentence-similarity.git
cd sentence-similarity
pip install -r requirements.txt
- You have to choose the model and method to be used to calculate the similarity between source and target sentences.
- You should wrap your source and target sentences with double quotations(").
python sensim.py
--source "SOURCE_SENTENCE"
--target "TARGET_SENTENCE"
--model MODEL_NAME
--method METHOD_NAME
- In the following section, you can see various use-case of
sentence-similarity
. - As you guys know, there is a no silver-bullet which can calculate perfect similarity between sentences. You should conduct various experiments with your dataset.
- Caution: TS-SS score might not fit with short-sentence similarity task, since this method originally devised to calculate the similarity between documents.
> python sensim.py --model use --method cosine
> Similarity using [use] with. [cosine] between
source> "I ate an apple"
target> "I went to the Apple" is | 0.76871
> python sensim.py --model elmo --method ts-ss
> Similarity using [elmo] with. [ts-ss] between
source> "I ate an apple"
target> "I went to the Apple" is | 32.35986
> python sensim.py --model bert --method pairwise
> Similarity using [bert] with. [pairwise] between
source> "I ate an apple"
target> "I went to the Apple" is | 0.91523
allennlp==0.9.0
bert-score==0.2.1
numpy==1.17.3
sentence-transformers==0.2.3
spacy==2.1.9
tensorflow==1.15.0
tensorflow-hub==0.7.0
torch==1.3.0
- Universal Sentence Encoder
- Deep contextualized word representations
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
- Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
- A Hybrid Geometric Approach for Measuring Similarity Level Among Documents and Document Clustering