Sambit 's repositories
hyperlink_crawler
This will traverse the Web as a linked graph from the starting --url finding all outgoing links (<a> tag): it will store each outgoing link for the URL, and then repeat the process for each or them, until --limit URLs will have been traversed. The output will be a JSON file with all incoming and outgoing link information
RPC_Password_Cracker
This Application can crack passwords consisting of alphabets [ a-z ] . It has "Server , Client and Workers" where the clients issue crack requests to Server. The server assigns the requests to available workers. At a given point of time multiple clients can issue requests. The server assigns these requests to workers such that there is load balancing among the workers. All the communication was carried out using RPC. We used "rpcgen" compiler to generate the code between stubs.
Text_Search_Engine
This is a search engine which can answer Boolean Queries, Phrase Queries and Wild Card Queries.
cluster_classifier
This project helps to cluster data sets from Bing API into clusters using "k-means" algorithm. The second part of the project was to build a Naive Bayes Classifier . It was build using a training data set. Then given a new data set it can classify to which class , the input data belongs. The third part of the project was to build a composite Classifier and Cluster.
Distributed_Password_Cracker
This Application can crack passwords consisting of alphabets [ a-z ] . It has "Server , Client and Workers" where the clients issue crack requests to Server. The server assigns the requests to available workers. At a given point of time multiple clients can issue requests. The server assigns these requests to workers such that there is load balancing among the workers.
dotfiles
Contains files to setup your development environment.
Dynamic_Block_Replication
HDFS-782 (open apache issue).
FileProcessor
Process Files in a Directory
TodoApp
This is an Android APP to create a To-Do List
Tweet_Search_Ranker
"Tweet Search and Ranker" , would take a "tweet corpus" as input and display the best tweets for a given query. The tweets with best "tf-idf" score were displayed at the top. The other part of the project involved Ranking the Users in the corpus using the classic "Page Rank" algorithm. The top 50 users with the Highest Page Rank were displayed.
unstack_the_stack
IR Project for Course