shuchita-rahman/A-Study-on-Text-Similarity-Measuring-Algorithm

tf-idf cosine-similarity euclidean-distance manhattan-distance jaccard-similarity minkowski-distance text-similarity nlp

Natural Language processing is one of the most vital concerns in this modern era. We are constantly driven to improve our system to minimize human error by achieving high accuracy and Machine Learning algorithm is serving the same purpose in many aspects like plagiarism checking, answering question, optimized search engine etc. Over last decade there have been significant improvements in this era. This paper is focused to develop a text similarity measurement system which will allow teachers to give marks in student’s answer script by comparing with a standard answer which is given by the teachers registered in the system. We have used five similarity measurement algorithm Euclidean, Manhattan, Minyowski, Jaccard, and Cosine distance. The system will average the value and give marks to student’s answer script. We have used standard answers of 100 questions in the “standard answer” document and compared with 100 answers given in a “sample answer” document for 3 cases. In the best, average and worst cases the system gives average 100%, 84.259% and 29.586% marks or accuracy respectively. The objective is not only limited to reduce the time but also eradicates the possibility of biasness. The paper describes the algorithm process and accuracy checking with several examples to ensure a clear understanding.

About

This work is for my thesis. This paper is published on I-IKM-2019

tf-idf cosine-similarity euclidean-distance manhattan-distance jaccard-similarity minkowski-distance text-similarity nlp

MIT License

Languages

Language:Python 100.0%