This project is based on similarity between the input question and all the questions in the data set. In order to compute this similarity we need to choose a similarity measure that would rate the similarity of two sentences, there are a lot of similarity measures for text but we will choose the cosine similarity for this one since it’s one of the most common measures in NLP.