Embedding

bash data/sogou_news_big/extract_file.sh

python word2vec/main.py --model SG --train_algo HS [Skip-gram Hierarchy Softmax]

python data/quora_fasttext/data_preprocess.py

python fasttext/main.py --gpu 1 [allow using gpu in tf.estimator]

bash data/sogou_news_big/extract_file.sh

bash doc2vec/model_run.sh

Comparison: doc2vec/doc2vec_vs_word2vec_sogou.ipynb

bash data/bookcorpus/run.sh

bash skip_thought/download_pretrain.sh

python skip_thought/main.py --clear_model 0 --gpu 1 --model[skip_thought, quick_thought] --cell_type[gru_gru, cnn_gru, cnn_lstm]

[Word2Vec] Distributed Representations of Words and Phrases and their Compositionality (Google 2013)
[Word2Vec] Efficient Estimation of Word Representations in Vector Space (Google 2013)
[Word2Vec] word2vec Explained: Deriving Mikolov et al.’s Negative-Sampling Word-Embedding Method (2014)
[Word2Vec] word2vec Parameter Learning Explained (2016)
[Fasttext] Enriching Word Vectors with Subword Information (Facebook 2017)
[Fasttext] [Fasttext]Bag of Tricks for Efficient Text Classification (Facebook 2016)
[Glove] Global Vectors for Word Representation (2014)
[ELMo] Deep contextualized word representations (2018)

[Doc2vec] Distributed Representations of Sentences and Documents (Google 2014)
[Doc2vec] A SIMPLE BUT TOUGH-TO-BEAT BASELINE FOR SEN- TENCE EMBEDDINGS (2017)
[Encoder-Decoder: Skip-Thought] Skip-Thought Vectors (2015)
[Encoder-Decoder: Skip-Thought] Rethinking Skip-thought- A Neighborhood based Approach (2017)
[Encoder-Decoder: CNN-LSTM]Learning Generic Sentence Representations Using Convolutional Neural Networks (2017)
[Encoder-Decoder: Quick-Thought] Quick-Thought: AN EFFICIENT FRAMEWORK FOR LEARNING SENTENCE REPRESENTATIONS (Google 2018)
[Transformer] Attention is all you need (2017)
[FastSent|DVAE]Learning Distributed Representations of Sentences from Unlabelled Data (2016)
[Siamese] Learning Text Similarity with Siamese Recurrent Networks (2016)
[InferSent] Supervised Learning of Universal Sentence Representations from Natural Language Inference Data (2018)
[SenGen] LEARNING GENERAL PURPOSE DISTRIBUTED SENTENCE REPRESENTATIONS VIA LARGE SCALE MULTITASK LEARNING (2018)
[USE] Universal Sentence Encoder (Google 2018)
[ULMFit] Universal Language Model Fine-tuning for Text Classification (fastai 2018)
[GPT] Improving Language Understanding by Generative Pre-Training (openai 2018)
[Bert] Pre-training of Deep Bidirectional Transformers for Language Understanding（Google 2019)
[Sentence-BERT] Sentence Embeddings using Siamese BERT-Networks (2019)

[Item2Vec] Item2Vec-Neural Item Embedding for Collaborative Filtering (Microsoft 2016)
[Airbnb] Real-time Personalization using Embeddings for Search Ranking at Airbnb (Airbnb 2018)
[DeepWalk] DeepWalk- Online Learning of Social Representations (SBU 2014)
[Node2vec] Node2vec - Scalable Feature Learning for Networks (Stanford 2016)
[Alibaba] Billion-scale Commodity Embedding for E-commerce Recommendation in Alibaba (Alibaba 2018)

[LSH] Locality-Sensitive Hashing for Finding Nearest Neighbors (2008)
[HNSW] Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs(2016)

semsevens / Embedding