Bhaarat-Pachori / Data-and-Information-Retrieval-

Java implementation of data and information retrieval algorithms and applications.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Knowledege Processing Technology

  1. Inverted_Index: This directory has the code which represent how the unstructured data (text data) is retrieved. For e.g. "Github is cool", now this method gives back documents that contains any of the 3 words or all the three words.

  2. Positional Index: This code works same as Inverted Index technique but this method gives answer to queries as it is. For e.g "Github is cool", now this method will make sure that it gives back (as a result) only those documents that has this whole word.

  3. BTreeInverted Index: This directory has the code that works same as Inverted Index but the only difference is the data structure used to store the data is a Balaced Binary Tree.

  4. NAIVE BAYES CLASSIFIER.

This project has the basic implementation of NAive Bayes Classifier in Java. You can use this code to understand how Naive Bayes Classifer works. A data.zip file is also attach which contains some reviews both positive and negative reviews in train and test folder. This implementation gives around an accuracy of 83-84% for classifying the document labels (a given document is positive or negative).

NOTE: To Run the code successfully download the data.zip file on your system and provide the path to the storing location as a command line argument.

About

Java implementation of data and information retrieval algorithms and applications.


Languages

Language:Java 80.9%Language:Python 19.1%