compare and decide for word embedding implementation

Question

compare and decide for word embedding implementation

Tilana opened this issue 7 years ago · comments

based on the literature research about general word embeddings #2 wordRank and word2vec are interesting to investigate and compare. Based on that the way of storing and reading the data (#4) might differ...

Tilana · Answer 1 · Sun Jul 02 2017 20:39:36 GMT+0800 (China Standard Time)

Gensim Word2Vec: http://radimrehurek.com/gensim/models/word2vec.html

Loading data: Gensim only requires that the input must provide sentences sequentially, when iterated over. No need to keep everything in RAM: we can provide one sentence, process it, forget it, load another sentence…
https://rare-technologies.com/word2vec-tutorial/

Also data streaming in python: https://rare-technologies.com/data-streaming-in-python-generators-iterators-iterables/

Tilana · Answer 2 · Sun Jul 02 2017 20:41:48 GMT+0800 (China Standard Time)

Tensorflow Word2Vec: https://www.tensorflow.org/tutorials/word2vec
no data streaming possible?

Tilana · Answer 3 · Sun Jul 02 2017 20:44:54 GMT+0800 (China Standard Time)

DeepLearning4j Word2Vec: https://deeplearning4j.org/word2vec#just
Implementation for Java...
SentenceIterator/DocumentIterator: Used to iterate over a dataset. A SentenceIterator returns strings and a DocumentIterator works with inputstreams.

Tilana · Answer 4 · Sun Jul 02 2017 20:49:23 GMT+0800 (China Standard Time)

Shishaohin WordRank: https://bitbucket.org/shihaoji/wordrank
With wrapper for Gensim: https://github.com/RaRe-Technologies/gensim/blob/develop/docs/notebooks/WordRank_wrapper_quickstart.ipynb

Tilana · Answer 5 · Thu Oct 26 2017 21:52:44 GMT+0800 (China Standard Time)

http://multithreaded.stitchfix.com/blog/2017/10/18/stop-using-word2vec/

Tilana · Answer 6 · Thu Oct 26 2017 21:56:31 GMT+0800 (China Standard Time)

http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.130.782