awesome awesome-list bert contextualized-representation cross-lingual embedding-models language-model natural-language nlp pretrained-embedding pretrained-language-model pretrained-models sentence-embeddings sentence-representations subword-models unsupervised-learning word-embeddings wordembedding

awesome-sentence-embedding

A curated list of pretrained sentence and word embedding models

About This Repo

well there are some awesome-lists for word embeddings and sentence embeddings, but all of them are outdated and more importantly incomplete
this repo will also be incomplete, but I'll try my best to find and include all the papers with pretrained models
this is not a typical awesome list because it has tables but I guess it's ok and much better than just a huge list
if you find any mistakes or find another paper or anything please send a pull request and help me to keep this list up to date
enjoy!

General Framework

Almost all the sentence embeddings work like this:
Given some sort of word embeddings and an optional encoder (for example an LSTM) they obtain the contextualized word embeddings.
Then they define some sort of pooling (it can be as simple as last pooling).
Based on that they either use it directly for the supervised classification task (like infersent) or generate the target sequence (like skip-thought).
So, in general, we have many sentence embeddings that you have never heard of, you can simply do mean-pooling over any word embedding and it's a sentence embedding!

Word Embeddings

Note: don't worry about the language of the code, you can almost always (except for the subword models) just use the pretrained embedding table in the framework of your choice and ignore the training code

date	paper	citation count	training code	pretrained models
-	WebVectors: A Toolkit for Building Web Interfaces for Vector Semantic Models	N/A	-	RusVectōrēs
2013/01	Efficient Estimation of Word Representations in Vector Space	999+	C	Word2Vec
2014/12	Word Representations via Gaussian Embedding	221	Cython	-
2014/??	A Probabilistic Model for Learning Multi-Prototype Word Embeddings	127	DMTK	-
2014/??	Dependency-Based Word Embeddings	719	C++	word2vecf
2014/??	GloVe: Global Vectors for Word Representation	999+	C	GloVe
2015/06	Sparse Overcomplete Word Vector Representations	129	C++	-
2015/06	From Paraphrase Database to Compositional Paraphrase Model and Back	3	Theano	PARAGRAM
2015/06	Non-distributional Word Vector Representations	68	Python	WordFeat
2015/??	Joint Learning of Character and Word Embeddings	195	C	-
2015/??	SensEmbed: Learning Sense Embeddings for Word and Relational Similarity	249	-	SensEmbed
2015/??	Topical Word Embeddings	292	Cython
2016/02	Swivel: Improving Embeddings by Noticing What's Missing	61	TF	-
2016/03	Counter-fitting Word Vectors to Linguistic Constraints	232	Python	counter-fitting(broken)
2016/05	Mixing Dirichlet Topic Models and Word Embeddings to Make lda2vec	91	Chainer	-
2016/06	Siamese CBOW: Optimizing Word Embeddings for Sentence Representations	166	Theano	Siamese CBOW
2016/06	Matrix Factorization using Window Sampling and Negative Sampling for Improved Word Representations	58	Go	lexvec
2016/07	Enriching Word Vectors with Subword Information	999+	C++	fastText
2016/08	Morphological Priors for Probabilistic Neural Word Embeddings	34	Theano	-
2016/11	A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks	359	C++	charNgram2vec
2016/12	ConceptNet 5.5: An Open Multilingual Graph of General Knowledge	604	Python	Numberbatch
2016/??	Learning Word Meta-Embeddings	58	-	Meta-Emb(broken)
2017/02	Offline bilingual word vectors, orthogonal transformations and the inverted softmax	336	Python	-
2017/04	Multimodal Word Distributions	57	TF	word2gm
2017/05	Poincaré Embeddings for Learning Hierarchical Representations	413	Pytorch	-
2017/06	Context encoders as a simple but powerful extension of word2vec	13	Python	-
2017/06	Semantic Specialisation of Distributional Word Vector Spaces using Monolingual and Cross-Lingual Constraints	99	TF	Attract-Repel
2017/08	Learning Chinese Word Representations From Glyphs Of Characters	44	C	-
2017/08	Making Sense of Word Embeddings	92	Python	sensegram
2017/09	Hash Embeddings for Efficient Word Representations	25	Keras	-
2017/10	BPEmb: Tokenization-free Pre-trained Subword Embeddings in 275 Languages	91	Gensim	BPEmb
2017/11	SPINE: SParse Interpretable Neural Embeddings	48	Pytorch	SPINE
2017/??	AraVec: A set of Arabic Word Embedding Models for use in Arabic NLP	161	Gensim	AraVec
2017/??	Ngram2vec: Learning Improved Word Representations from Ngram Co-occurrence Statistics	25	C	-
2017/??	Dict2vec : Learning Word Embeddings using Lexical Dictionaries	49	C++	Dict2vec
2017/??	Joint Embeddings of Chinese Words, Characters, and Fine-grained Subcharacter Components	63	C	-
2018/04	Representation Tradeoffs for Hyperbolic Embeddings	120	Pytorch	h-MDS
2018/04	Dynamic Meta-Embeddings for Improved Sentence Representations	60	Pytorch	DME/CDME
2018/05	Analogical Reasoning on Chinese Morphological and Semantic Relations	128	-	ChineseWordVectors
2018/06	Probabilistic FastText for Multi-Sense Word Embeddings	39	C++	Probabilistic FastText
2018/09	Incorporating Syntactic and Semantic Information in Word Embeddings using Graph Convolutional Networks	3	TF	SynGCN
2018/09	FRAGE: Frequency-Agnostic Word Representation	64	Pytorch	-
2018/12	Wikipedia2Vec: An Optimized Tool for LearningEmbeddings of Words and Entities from Wikipedia	17	Cython	Wikipedia2Vec
2018/??	Directional Skip-Gram: Explicitly Distinguishing Left and Right Context for Word Embeddings	106	-	ChineseEmbedding
2018/??	cw2vec: Learning Chinese Word Embeddings with Stroke n-gram Information	45	C++	-
2019/02	VCWE: Visual Character-Enhanced Word Embeddings	5	Pytorch	VCWE
2019/05	Learning Cross-lingual Embeddings from Twitter via Distant Supervision	2	Text	-
2019/08	An Unsupervised Character-Aware Neural Approach to Word and Context Representation Learning	5	TF	-
2019/08	ViCo: Word Embeddings from Visual Co-occurrences	7	Pytorch	ViCo
2019/11	Spherical Text Embedding	25	C	-
2019/??	Unsupervised word embeddings capture latent knowledge from materials science literature	150	Gensim	-

OOV Handling

Drop OOV words!
One OOV vector(unk vector)
Use subword models(ngram, bpe, char)
ALaCarte: A La Carte Embedding: Cheap but Effective Induction of Semantic Feature Vectors
Mimick: Mimicking Word Embeddings using Subword RNNs
CompactReconstruction: Subword-based Compact Reconstruction of Word Embeddings

Contextualized Word Embeddings

Note: all the unofficial models can load the official pretrained models

date	paper	citation count	code	pretrained models
-	Language Models are Unsupervised Multitask Learners	N/A	TF Pytorch, TF2.0 Keras	GPT-2(117M, 124M, 345M, 355M, 774M, 1558M)
2017/08	Learned in Translation: Contextualized Word Vectors	524	Pytorch Keras	CoVe
2018/01	Universal Language Model Fine-tuning for Text Classification	167	Pytorch	ULMFit(English, Zoo)
2018/02	Deep contextualized word representations	999+	Pytorch TF	ELMO(AllenNLP, TF-Hub)
2018/04	Efficient Contextualized Representation:Language Model Pruning for Sequence Labeling	26	Pytorch	LD-Net
2018/07	Towards Better UD Parsing: Deep Contextualized Word Embeddings, Ensemble, and Treebank Concatenation	120	Pytorch	ELMo
2018/08	Direct Output Connection for a High-Rank Language Model	24	Pytorch	DOC
2018/10	BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding	999+	TF Keras Pytorch, TF2.0 MXNet PaddlePaddle TF Keras	BERT(BERT, ERNIE, KoBERT)
2018/??	Contextual String Embeddings for Sequence Labeling	486	Pytorch	Flair
2018/??	Improving Language Understanding by Generative Pre-Training	999+	TF Keras Pytorch, TF2.0	GPT
2019/01	Multi-Task Deep Neural Networks for Natural Language Understanding	364	Pytorch	MT-DNN
2019/01	BioBERT: pre-trained biomedical language representation model for biomedical text mining	634	TF	BioBERT
2019/01	Cross-lingual Language Model Pretraining	639	Pytorch Pytorch, TF2.0	XLM
2019/01	Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context	754	TF Pytorch Pytorch, TF2.0	Transformer-XL
2019/02	Efficient Contextual Representation Learning Without Softmax Layer	2	Pytorch	-
2019/03	SciBERT: Pretrained Contextualized Embeddings for Scientific Text	124	Pytorch, TF	SciBERT
2019/04	Publicly Available Clinical BERT Embeddings	229	Text	clinicalBERT
2019/04	ClinicalBERT: Modeling Clinical Notes and Predicting Hospital Readmission	84	Pytorch	ClinicalBERT
2019/05	ERNIE: Enhanced Language Representation with Informative Entities	210	Pytorch	ERNIE
2019/05	Unified Language Model Pre-training for Natural Language Understanding and Generation	278	Pytorch	UniLMv1(unilm1-large-cased, unilm1-base-cased)
2019/05	HIBERT: Document Level Pre-training of Hierarchical Bidirectional Transformers for Document Summarization	81		-
2019/06	Pre-Training with Whole Word Masking for Chinese BERT	98	Pytorch, TF	BERT-wwm
2019/06	XLNet: Generalized Autoregressive Pretraining for Language Understanding	999+	TF Pytorch, TF2.0	XLNet
2019/07	ERNIE 2.0: A Continual Pre-training Framework for Language Understanding	107	PaddlePaddle	ERNIE 2.0
2019/07	SpanBERT: Improving Pre-training by Representing and Predicting Spans	282	Pytorch	SpanBERT
2019/07	RoBERTa: A Robustly Optimized BERT Pretraining Approach	999+	Pytorch Pytorch, TF2.0	RoBERTa
2019/09	Subword ELMo	1	Pytorch	-
2019/09	Knowledge Enhanced Contextual Word Representations	115		-
2019/09	TinyBERT: Distilling BERT for Natural Language Understanding	129		-
2019/09	Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism	136	Pytorch	Megatron-LM(BERT-345M, GPT-2-345M)
2019/09	MultiFiT: Efficient Multi-lingual Language Model Fine-tuning	29	Pytorch	-
2019/09	Extreme Language Model Compression with Optimal Subwords and Shared Projections	32		-
2019/09	MULE: Multimodal Universal Language Embedding	5		-
2019/09	Unicoder: A Universal Language Encoder by Pre-training with Multiple Cross-lingual Tasks	51		-
2019/09	K-BERT: Enabling Language Representation with Knowledge Graph	59		-
2019/09	UNITER: Learning UNiversal Image-TExt Representations	60		-
2019/09	ALBERT: A Lite BERT for Self-supervised Learning of Language Representations	803	TF	-
2019/10	BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension	349	Pytorch	BART(bart.base, bart.large, bart.large.mnli, bart.large.cnn, bart.large.xsum)
2019/10	DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter	481	Pytorch, TF2.0	DistilBERT
2019/10	Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer	696	TF	T5
2019/11	CamemBERT: a Tasty French Language Model	102	-	CamemBERT
2019/11	ZEN: Pre-training Chinese Text Encoder Enhanced by N-gram Representations	15	Pytorch	-
2019/11	Unsupervised Cross-lingual Representation Learning at Scale	319	Pytorch	XLM-R (XLM-RoBERTa)(xlmr.large, xlmr.base)
2020/01	ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training	35	Pytorch	ProphetNet(ProphetNet-large-16GB, ProphetNet-large-160GB)
2020/02	CodeBERT: A Pre-Trained Model for Programming and Natural Languages	25	Pytorch	CodeBERT
2020/02	UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training	33	Pytorch	-
2020/03	ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators	203	TF	ELECTRA(ELECTRA-Small, ELECTRA-Base, ELECTRA-Large)
2020/04	MPNet: Masked and Permuted Pre-training for Language Understanding	5	Pytorch	MPNet
2020/05	ParsBERT: Transformer-based Model for Persian Language Understanding	1	Pytorch	ParsBERT
2020/05	Language Models are Few-Shot Learners	382	-	-
2020/07	InfoXLM: An Information-Theoretic Framework for Cross-Lingual Language Model Pre-Training	12	Pytorch	-

Pooling Methods

Encoders

date	paper	citation count	code	model_name
-	Incremental Domain Adaptation for Neural Machine Translation in Low-Resource Settings	N/A	Python	AraSIF
2014/05	Distributed Representations of Sentences and Documents	999+	Pytorch Python	Doc2Vec
2014/11	Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models	849	Theano Pytorch	VSE
2015/06	Aligning Books and Movies: Towards Story-like Visual Explanations by Watching Movies and Reading Books	795	Theano TF Pytorch, Torch	SkipThought
2015/11	Order-Embeddings of Images and Language	354	Theano	order-embedding
2015/11	Towards Universal Paraphrastic Sentence Embeddings	411	Theano	ParagramPhrase
2015/??	From Word Embeddings to Document Distances	999+	C, Python	Word Mover's Distance
2016/02	Learning Distributed Representations of Sentences from Unlabelled Data	363	Python	FastSent
2016/07	Charagram: Embedding Words and Sentences via Character n-grams	144	Theano	Charagram
2016/11	Learning Generic Sentence Representations Using Convolutional Neural Networks	76	Theano	ConvSent
2017/03	Unsupervised Learning of Sentence Embeddings using Compositional n-Gram Features	319	C++	Sent2Vec
2017/04	Learning to Generate Reviews and Discovering Sentiment	293	TF Pytorch Pytorch	Sentiment Neuron
2017/05	Revisiting Recurrent Networks for Paraphrastic Sentence Embeddings	60	Theano	GRAN
2017/05	Supervised Learning of Universal Sentence Representations from Natural Language Inference Data	999+	Pytorch	InferSent
2017/07	VSE++: Improving Visual-Semantic Embeddings with Hard Negatives	132	Pytorch	VSE++
2017/08	Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm	357	Keras Pytorch	DeepMoji
2017/09	StarSpace: Embed All The Things!	129	C++	StarSpace
2017/10	DisSent: Learning Sentence Representations from Explicit Discourse Relations	47	Pytorch	DisSent
2017/11	Pushing the Limits of Paraphrastic Sentence Embeddings with Millions of Machine Translations	128	Theano	para-nmt
2017/11	Dual-Path Convolutional Image-Text Embedding with Instance Loss	44	Matlab	Image-Text-Embedding
2018/03	An efficient framework for learning sentence representations	183	TF	Quick-Thought
2018/03	Universal Sentence Encoder	564	TF-Hub	USE
2018/04	End-Task Oriented Textual Entailment via Deep Explorations of Inter-Sentence Interactions	14	Theano	DEISTE
2018/04	Learning general purpose distributed sentence representations via large scale multi-task learning	198	Pytorch	GenSen
2018/06	Embedding Text in Hyperbolic Spaces	50	TF	HyperText
2018/07	Representation Learning with Contrastive Predictive Coding	736	Keras	CPC
2018/08	Context Mover’s Distance & Barycenters: Optimal transport of contexts for building representations	8	Python	CMD
2018/09	Learning Universal Sentence Representations with Mean-Max Attention Autoencoder	14	TF	Mean-MaxAAE
2018/10	Learning Cross-Lingual Sentence Representations via a Multi-task Dual-Encoder Model	35	TF-Hub	USE-xling
2018/10	Improving Sentence Representations with Consensus Maximisation	4	-	Multi-view
2018/10	BioSentVec: creating sentence embeddings for biomedical texts	70	Python	BioSentVec
2018/11	Word Mover's Embedding: From Word2Vec to Document Embedding	47	C, Python	WordMoversEmbeddings
2018/11	A Hierarchical Multi-task Approach for Learning Embeddings from Semantic Tasks	76	Pytorch	HMTL
2018/12	Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond	238	Pytorch	LASER
2018/??	Convolutional Neural Network for Universal Sentence Embeddings	6	Theano	CSE
2019/01	No Training Required: Exploring Random Encoders for Sentence Classification	54	Pytorch	randsent
2019/02	CBOW Is Not All You Need: Combining CBOW with the Compositional Matrix Space Model	4	Pytorch	CMOW
2019/07	GLOSS: Generative Latent Optimization of Sentence Representations	1	-	GLOSS
2019/07	Multilingual Universal Sentence Encoder	52	TF-Hub	MultilingualUSE
2019/08	Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks	261	Pytorch	Sentence-BERT
2020/02	SBERT-WK: A Sentence Embedding Method By Dissecting BERT-based Word Models	11	Pytorch	SBERT-WK
2020/06	DeCLUTR: Deep Contrastive Learning for Unsupervised Textual Representations	4	Pytorch	DeCLUTR
2020/07	Language-agnostic BERT Sentence Embedding	5	TF-Hub	LaBSE
2020/11	On the Sentence Embeddings from Pre-trained Language Models	0	TF	BERT-flow