compare and decide for word embedding implementation
Tilana opened this issue · comments
Gensim Word2Vec: http://radimrehurek.com/gensim/models/word2vec.html
Loading data: Gensim only requires that the input must provide sentences sequentially, when iterated over. No need to keep everything in RAM: we can provide one sentence, process it, forget it, load another sentence…
https://rare-technologies.com/word2vec-tutorial/
Also data streaming in python: https://rare-technologies.com/data-streaming-in-python-generators-iterators-iterables/
Tensorflow Word2Vec: https://www.tensorflow.org/tutorials/word2vec
no data streaming possible?
DeepLearning4j Word2Vec: https://deeplearning4j.org/word2vec#just
Implementation for Java...
SentenceIterator/DocumentIterator: Used to iterate over a dataset. A SentenceIterator returns strings and a DocumentIterator works with inputstreams.
Shishaohin WordRank: https://bitbucket.org/shihaoji/wordrank
With wrapper for Gensim: https://github.com/RaRe-Technologies/gensim/blob/develop/docs/notebooks/WordRank_wrapper_quickstart.ipynb