Must-Read NLP Papers ( -2020)

This repository contains important NLP papers (most), well-explained materials that everyone working in the field should know about and read.

I also implements several State-of-the-art NLP models. You can find that on my repo.

Check Neural Network Language Model (NNLM)
Attention Is All You Need (Transformer)

Highlight of this repo:

NLP: Pretrained Language Model, Machine Translation, Text Summarization
CV: Image-to-image Translation
Learning Algorithm: Meta Learning

Overview

Yongjun Hong, et al. How Generative Adversarial Networks and Their Variants Work: An Overview. ACM 2019. [ACM]
Samuel L. Smith, et al. Don't Decay the Learning Rate, Increase the Batch Size. ICLR 2018. [ICLR]

Clustering & Word Embeddings

Peter F Brown, et al. Class-Based n-gram Models of Natural Language. 1992. [ACL Anthology]
Tomas Mikolov, et al. Efficient Estimation of Word Representations in Vector Space. 2013. [ArXiv]
Tomas Mikolov, et al. Distributed Representations of Words and Phrases and their Compositionality. NIPS 2013. [ArXiv]
Quoc V. Le and Tomas Mikolov. Distributed Representations of Sentences and Documents. 2014. [ArXiv]
Jeffrey Pennington, et al. GloVe: Global Vectors for Word Representation. 2014. [ACL Anthology]
Piotr Bojanowski, et al. Enriching Word Vectors with Subword Information. 2017. [ACL Anthology]

Cross-lingual Learning

Junjie Hu, et al. XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization. 2020. [ArXiv]

Evaluation Metric

Kishore Papineni, et al. BLEU: a Method for Automatic Evaluation of Machine Translation. 2002 [CiteSeer]
Chin-Yew Lin. ROUGE: A Package for Automatic Evaluation of Summaries. ACL 2004. [ACL Anthology

Event Recognition

Amosse Edouard. Event Detection and Analysis On Short Text Messages. 2018. [ResearchGate]
Deepayan Chakrabarti and Kunal Punera. Event Summarization Using Tweets. ICWSM 2011. [ResearchGate]
Maria Vargas-Vera and David Celjuska. Event Recognition on News Stories and Semi-Automatic Population of an Ontology. Web Intelligence 2004. [ResearchGate]

Gated Recurrent Unit

Junyoung Chung, et al. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. CoRR 2014. [ArXiv]

Image Captioning

Steven J. Rennie, et al. Self-critical Sequence Training for Image Captioning. CVPR 2017. [ArXiv]

Image Recognition

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. [ICLR]

Image-to-Image Translation

Jun-Yan Zhu, et al. Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks. ICCV 2017. [ArXiv]
Yunjey Choi et al. StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation. CVPR 2018. [ArXiv]
Taesung Park, et al. Contrastive Learning for Unpaired Image-to-Image Translation. ECCV 2020. [ArXiv]

Language Modeling

Yoshua Bengio, et al. A Neural Probabilistic Language Model, J. of Machine Learning Research. 2003. [ACM DL]
Rafal Jozefowicz, et al. Exploring the Limits of Language Modeling. 2016. [ArXiv]
Matthew Peters, et al. Semi-supervised sequence tagging with bidirectional language models. ACL 2017. [ArXiv]
Matthew Peters, et al. Deep contextualized word representations. NAACL 2018. [ArXiv]
Jacob Devlin, et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. 2018. [ArXiv]
Jeremy Howard and Sebastian Ruder. Universal Language Model Fine-tuning for Text Classification. ACL 2018. [ArXiv]
Alec Radford, et al. Improving Language Understanding by Generative Pre-Training. 2018. [OpenAI]
Alec Radford, et al. Language Models are Unsupervised Multitask Learners. 2019. [OpenAI]]
Zhenzhong Lan, et al. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations. ICLR 2019. [OpenReview]
Zihang Dai, et al. Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context. ACL 2019. [ArXiv]
Zhilin Yang, et al. XLNet: Generalized Autoregressive Pretraining for Language Understanding. NIPS 2019. [ArXiv]
Colin Raffel, et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. [ArXiv]
Nikita Kitaev, et al. Reformer: The Efficient Transformer. [ArXiv]
Kevin Clark, er al. ELECTRA_Pre-training Text Encoders as Discriminators Rather Than Generators. ICLR 2020. [ArXiv]
Tom B. Brown, et al. Language Models are Few-Shot Learners. 2020. [ArXiv]
Louis Martin, et al. CamemBERT: a Tasty French Language Model. ACL 2020. [ArXiv]

Machine Translation

Dzmitry Bahdanau, et al. Neural Machine Translation by Jointly Learning to Align and Translate. ICLR 2015. [ArXiv]
Minh-Thang Luong, et al. Effective Approaches to Attention-based Neural Machine Translation. EMNLP 2015. [ArXiv]
Massive Exploration of Neural Machine Translation Architectures. ACL 2017 [ArXiv]
Yun Chen, et al. A Teacher-Student Framework for Zero-Resource Neural Machine Translation. ACL 2017. [ArXiv]
Ashish Vaswani, et al. Attention Is All You Need. 2017. [ArXiv]
Guillaume Lample and Alexis Conneau. Cross-lingual Language Model Pretraining. 2019. [ArXiv]
Alexis Conneau et al. Unsupervised Cross-lingual Representation Learning at Scale. ACL 2020. [ArXiv]
Christos Baziotis et al. Language Model Prior for Low-Resource Neural Machine Translation. EMNLP 2020. [ArXiv]

Meta Learning

Chelsea Finn, et al. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks. ICML 2017. [ArXiv]
Sachin Ravi and Hugo Larochelle. Optimization as a Model for Few-Shot Learning. ICLR 2017. [OpenReview]
Andrei A. Rusu, et al. Meta-Learning with Latent Embedding Optimization. ICLR 2019. [ArXiv]
Aravind Rajeswaran et al. Meta-Learning with Implicit Gradients, et al.: Meta-Learning with Implicit Gradients. NIPS 2019. [ArXiv]

Multi-Task Learning

Victor Sanh, et al. A Hierarchical Multi-task Approach for Learning Embeddings from Semantic Tasks. AAAI 2019 [ArXiv]

Named Entity Recognition

Guillaume Lample, et al. Neural Architectures for Named Entity Recognition. ACL 2016. [ArXiv]
Xuezhe Ma, Eduard Hovy. End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF. ACL 2016. [ArXiv]
Matthew Peters, et al. Semi-Supervised Sequence Tagging With Bidirectional Language Models. ACL 2017. [ArXiv]
Kevin Clark, et al. Semi-Supervised Sequence Modeling with Cross-View Training. EMNLP 2018. [ArXiv]
Matthew Peters, et al. Deep Contextualized Word Representations. NAACL 2018. [ArXiv]
Abbas Ghaddar and Philippe Lannglais. Robust Lexical Features for Improved Neural Network Named-Entity Recognition. COLING 2018. [ACL Anthology]
Alan Akbik, et al. Contextual String Embeddings for Sequence Labeling. ACL 2018. [ResearchGate]
Alexei Baevski, et al. Cloze-driven Pretraining of Self-attention Networks. 2019. [ArXiv]

Probabilistic Graphical Models

John Lafferty, Andrew McCallum, Fernando C.N. Pereira: Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. ICML 2001. [ACM DL]

Reinforcement Learning

Kristopher D. Asis, et al. Multi_Step Reinforcement Learning_A Unifying Algorithm. AAAI 2018. [ArXiv]

Sentence Compression

Thibault Fevry and Jason Phang. Unsupervised Sentence Compression using Denoising Auto-Encoders. CoNLL 2018. [ACL Anthology]

Sequence Models

Ilya Sutskever, et al. Sequence to Sequence Learning with Neural Networks. 2014. [ArXiv]

Text Classification

Yoon Kim, et al. Convolutional Neural Networks for Sentence Classification. EMNLP 2014. [ArXiv]
Xiang Zhang, et al. Character-Level Convolutional Networks For Text Classification. NIPS 2015. [ArXiv]
Yoon Kim, et al. Character-Aware Neural Language Models. AAAI 2016. [ArXiv]
Zichao Yang, et al. Hierarchical Attention Networks for Document Classification. NAACL 2016. [ACL Anthology]
Alon Jacovi, et al. Understanding Convolutional Neural Networks for Text Classification. EMNLP 2018. [ACL Anthology]

Text Generation

Lantao Yu, et al. SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient. AAAI 2017. [ArXiv]
William Fedus, et al. MaskGAN: Better Text Generation via Filling in the______. ICLR 2018. [ArXiv]
Weili Nie, et al. RELGAN: RELATIONAL GENERATIVE ADVERSARIAL NETWORKS FOR TEXT GENERATION. ICLR 2019. [ICLR]
Kaitao Song, et al. MASS: Masked Sequence to Sequence Pre-Training for Langauge Generation. ICML 2019. [ArXiv]
Mike Lewis, et al. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation. Translation, and Comprehension, ACL 2020. [AxXiv]

Text Style Transfer

Zichao Yang, et al. Unsupervised Text Style Transfer using Language Models as Discriminators. NIPS 2018. [ArXiv]
Sandeep Subramanian, et al. Multiple-Attribute Text Style Transfer. ICLR 2019. [ArXiv]

Text Summarization

Romain Paulus, et al. A Deep Reinforced Model for Abstractive Summarization. ICLR 2018. [ArXiv]
Angela Fan, et al. Controllable Abstractive Summarization. ACL 2018. [ArXiv]
Yaushian Wang and Hung-Yi Lee. Learning to Encode Text as Human-Readable Summaries using Generative Adversarial Networks. EMNLP 2018. [ACL Anthology]
Peter J. Liu, et al. SummAE: Zero-Shot Abstractive Text Summarization using Length-Agnostic Auto-Encoders. 2019 [ArXiv]
Christos Baziotis, et al. SEQ^3: Differentiable Sequence-to-Sequence-to-Sequence Autoencoder for Unsupervised Abstractive Sentence Compression. NAACL 2019. [ArXiv]
Jingqing Zhang, et al. PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. ICML 2020. [ArXiv]

pjlintw / NLP-papers