Shashank Gupta's repositories
Annotated-WikiExtractor
Simple Wikipedia plain text extractor with article link annotations and Hadoop support.
arxiv-sanity-preserver
Web interface for browsing, search and filtering recent arxiv submissions
bat-framework
A framework to compare entity annotation systems.
clinton-email-cruncher
Download Hillary Clinton's emails and query them with sqlite
Cloud9
Cloud9 is a Hadoop toolkit for working with big data
dkpro-jwpl
DKPro JWPL (DKPro Java Wikipedia Library) is a free, Java-based application programming interface that allows to access all information in Wikipedia.
figment
Fine-grained embedding-based entity typing
FOX
Federated Knowledge Extraction Framework
gerbil
GERBIL - General Entity annotatoR Benchmark
hillary-clinton-emails
Code to transform Hillary's emails from raw PDF documents to a SQLite database
hobbs
Flexible Unsupervised Entity Type and Resolution System
java-libpst
A library to read PST files with java, without need for external libraries.
pr-toolkit
Automatically exported from code.google.com/p/pr-toolkit
vector-entailment
A suite of representation learning models for sentence embedding, and some tasks to evaluate them on.
word2vec
This tool provides an efficient implementation of the continuous bag-of-words and skip-gram architectures for computing vector representations of words. These representations can be subsequently used in many natural language processing applications and for further research.
Word2VecJava
Word2Vec Java Port