TF-IDF stands for “Term Frequency — Inverse Document Frequency”. This is a technique to quantify a word in documents, we generally compute a weight to each word which signifies the importance of the word in the document and corpus. In here I have used TF-IDF on sinhala documents and try to identify similarity between two sets of documents. Final output shows the query document, highest similarity with given documents and similar document number.
TF-IDF stands for “Term Frequency — Inverse Document Frequency”. This is a technique to quantify a word in documents, we generally compute a weight to each word which signifies the importance of the word in the document and corpus. In here I have used TF-IDF on sinhala documents and try to identify similarity between two sets of documents. Final output shows the query document, highest similarity with given documents and similar document number.