Embeddings_based_STS_using_LIME_XAI

In this project we are finding the semantic text similarity between 5 different sentence pairs using 5 different sentence-transformers. Later on, we will be giving easily interpretable visual explanations using LIME Explainable AI (XAI). The goal of this project is to understand if LIME can be a useful XAI tool to obtain explanations for semantic similarity results generated by any model. Also, to check if we can use this model to identify ambiguity error.

The 5 sentence pairs are used based on 5 different categories checking their semantic and syntactic similarity.

Sentence 1	Sentence 2	Semantic Similarity	Syntactic Similarity
I love to watch a lot of movies	I hate to watch movies	No	Yes
I love all animals	Dog is my favorite animal	Yes	No
I love birds	I love peacocks	Yes	Yes
Rini is my childhood friend	I recently met Rini	No	No
I lost my watch	I will watch out for you	No	Yes

The 5 different models are used based on their performance, speed and size for carrying out semantic similarity task.

Transformer	Performance	Speed	Model Size
all-MiniLM-L6-v2	58.80	14200	80 MB
paraphrase-MiniLM-L6-v2	52.56	14200	80 MB
paraphrase-albert-small-v2	52.25	5000	43 MB
all-mpnet-base-v2	63.30	2800	420MB
all-MiniLM-L12-v2	59.76	7500	120MB

Reference: https://www.sbert.net/docs/pretrained_models.html

Result Analysis

The experiments showed that LIME is not an ideal choice to obtain semantic text similarity explainations due to the poor quality of output generated. Although, LIME could successfully identify and highlight ambiguous words which could eventually help us identify ambiguity error. Lime gave the best output results for paraphrase-MiniLM-L6-v2 model

Here is the Output for one of the example sentences shown by the best performing sentence-transformer

A detailed explanation about the model and the research can be found inside the notebooks.

About

Obtaining semantic text similarity using advanced Sentence Transformers. Inducing LIME Explainable AI to provide clear visual insights into model predictions and identifying ambiguity errors.

Languages

Language:Jupyter Notebook 100.0%