Natural Language Processing Roadmap

🗺️ 一个「自然语言处理」的学习路线图。

⚠️ 注意:

这个项目包含一个名为 PCB 的小实验，这个的 PCB 不是印刷电路板 Printed Circuit Board，也不是进程控制块 Process Control Block，而是 Paper Code Blog 的缩写。我认为 论文、代码 和 博客 这三个东西，可以让我们兼顾理论和实践同时，快速地掌握知识点！

每篇论文后面的星星个数代表论文的重要性（主观意见，仅供参考）。

🌟: 一般；

🌟🌟: 重要；

🌟🌟🌟: 非常重要。

1 分词 `Word Segmentation`

词是能够独立活动的最小语言单位。 在自然语言处理中，通常都是以词作为基本单位进行处理的。由于英文本身具有天生的优势，以空格划分所有词。而中文的词与词之间没有明显的分割标记，所以在做中文语言处理前的首要任务，就是把连续中文句子分割成「词序列」。这个分割的过程就叫分词。了解更多

综述

汉语分词技术综述 {Paper} 🌟
国内中文自动分词技术研究综述 {Paper} 🌟
汉语自动分词的研究现状与困难 {Paper} 🌟🌟
汉语自动分词研究评述 {Paper} 🌟🌟
中文分词十年又回顾: 2007-2017 {Paper} 🌟🌟🌟
chinese-word-segmentation {Code}
深度学习中文分词调研 {Blog}

2 词嵌入 `Word Embedding`

词嵌入就是找到一个映射或者函数，生成在一个新的空间上的表示，该表示被称为「单词表示」。了解更多

综述

Word Embeddings: A Survey {Paper} 🌟🌟🌟
Visualizing Attention in Transformer-Based Language Representation Models {Paper} 🌟🌟
PTMs: Pre-trained Models for Natural Language Processing: A Survey {Paper} {Blog} 🌟🌟🌟
Efficient Transformers: A Survey {Paper} 🌟🌟
A Survey of Transformers {Paper} 🌟🌟
Pre-Trained Models: Past, Present and Future {Paper} 🌟🌟
Pretrained Language Models for Text Generation: A Survey {Paper} 🌟
A Practical Survey on Faster and Lighter Transformers {Paper} 🌟
The NLP Cookbook: Modern Recipes for Transformer based Deep Learning Architectures {Paper} 🌟🌟

核心

NNLM: A Neural Probabilistic Language Model {Paper} {Code} {Blog} 🌟
W2V: Efficient Estimation of Word Representations in Vector Space {Paper} 🌟🌟
Glove: Global Vectors for Word Representation {Paper} 🌟🌟
CharCNN: Character-level Convolutional Networks for Text Classification {Paper} {Blog} 🌟
ULMFiT: Universal Language Model Fine-tuning for Text Classification {Paper} 🌟
SiATL: An Embarrassingly Simple Approach for Transfer Learning from Pretrained Language Models {Paper} 🌟
FastText: Bag of Tricks for Efficient Text Classification {Paper} 🌟🌟
CoVe: Learned in Translation: Contextualized Word Vectors {Paper} 🌟
ELMo: Deep contextualized word representations {Paper} 🌟🌟
Transformer: Attention is All you Need {Paper} {Code} {Blog} 🌟🌟🌟
GPT: Improving Language Understanding by Generative Pre-Training {Paper} 🌟
GPT2: Language Models are Unsupervised Multitask Learners {Paper} {Code} {Blog} 🌟🌟
GPT3: Language Models are Few-Shot Learners {Paper} {Code} 🌟🌟🌟
GPT4: GPT-4 Technical Report {Paper} 🌟🌟🌟
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding {Paper} {Code} {Blog} 🌟🌟🌟
UniLM: Unified Language Model Pre-training for Natural Language Understanding and Generation {Paper} {Code} {Blog} 🌟🌟
T5: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer {Paper} {Code} {Blog} 🌟
ERNIE(Baidu): Enhanced Representation through Knowledge Integration {Paper} {Code} 🌟
ERNIE(Tsinghua): Enhanced Language Representation with Informative Entities {Paper} {Code} 🌟
RoBERTa: A Robustly Optimized BERT Pretraining Approach {Paper} 🌟
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations {Paper} {Code} 🌟🌟
TinyBERT: Distilling BERT for Natural Language Understanding {Paper} 🌟🌟
FastFormers: Highly Efficient Transformer Models for Natural Language Understanding {Paper} {Code} 🌟🌟

其他

word2vec Parameter Learning Explained {Paper} 🌟🌟
Semi-supervised Sequence Learning {Paper} 🌟🌟
BERT Rediscovers the Classical NLP Pipeline {Paper} 🌟
Pre-trained Languge Model Papers {Blog}
HuggingFace Transformers {Code}
Fudan FastNLP {Code}

3 文本分类 `Text Classification`

综述

A Survey on Text Classification: From Shallow to Deep Learning {Paper} 🌟🌟🌟
Deep Learning Based Text Classification: A Comprehensive Review {Paper} 🌟🌟

CNN

TextCNN:Convolutional Neural Networks for Sentence Classification {Paper} {Code} 🌟🌟🌟
Convolutional Neural Networks for Text Categorization: Shallow Word-level vs. Deep Character-level {Paper} 🌟
DPCNN: Deep Pyramid Convolutional Neural Networks for Text Categorization {Paper} {Code} 🌟🌟

4 序列标注 `Sequence Labeling`

综述

Sequence Labeling 的发展史（DNNs+CRF）{Blog}

Bi-LSTM + CRF

End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF {Paper} 🌟🌟
pytorch_NER_BiLSTM_CNN_CRF {Code}
NN_NER_tensorFlow {Code}
End-to-end-Sequence-Labeling-via-Bi-directional-LSTM-CNNs-CRF-Tutorial {Code}
Bi-directional LSTM-CNNs-CRF {Code}

其他

Sequence to Sequence Learning with Neural Networks {Paper} 🌟
Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks {Paper} 🌟

5 对话系统 `Dialogue Systems`

综述

A Survey on Dialogue Systems: Recent Advances and New Frontiers {Paper} {Blog} 🌟🌟
小哥哥，检索式chatbot了解一下？ {Blog} 🌟🌟🌟
Recent Neural Methods on Slot Filling and Intent Classification for Task-Oriented Dialogue Systems: A Survey {Paper} 🌟🌟

Open Domain Dialogue Systems

HERD: Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models {Paper} {Code} 🌟🌟
Adversarial Learning for Neural Dialogue Generation {Paper} {Code} {Blog} 🌟🌟

Task Oriented Dialogue Systems

Joint NLU: Attention-Based Recurrent Neural Network Models for Joint Intent Detection and Slot Filling {Paper} {Code} 🌟🌟
BERT for Joint Intent Classification and Slot Filling {Paper} 🌟
Sequicity: Simplifying Task-oriented Dialogue Systems with Single Sequence-to-Sequence Architectures {Paper} {Code} 🌟🌟
Attention with Intention for a Neural Network Conversation Model {Paper} 🌟
REDP: Few-Shot Generalization Across Dialogue Tasks {Paper} {Blog} 🌟🌟
TEDP: Dialogue Transformers {Paper} {Code} {Blog} 🌟🌟🌟

Conversational Response Selection

Multi-view Response Selection for Human-Computer Conversation {Paper} 🌟🌟
SMN: Sequential Matching Network: A New Architecture for Multi-turn Response Selection in Retrieval-Based Chatbots {Paper} {Code} {Blog} 🌟🌟🌟:
DUA: Modeling Multi-turn Conversation with Deep Utterance Aggregation {Paper} {Code} {Blog} 🌟🌟
DAM: Multi-Turn Response Selection for Chatbots with Deep Attention Matching Network {Paper} {Code} {Blog} 🌟🌟🌟
IMN: Interactive Matching Network for Multi-Turn Response Selection in Retrieval-Based Chatbots {Paper} {Code} {Blog} 🌟🌟
Dialogue Transformers {Paper} 🌟🌟

6 主题模型 `Topic Model`

LDA

Latent Dirichlet Allocation {Paper} {Blog} 🌟🌟🌟

7 知识图谱 `Knowledge Graph`

综述

Towards a Definition of Knowledge Graphs {Paper} 🌟🌟🌟

8 提示学习 `Prompt Learning`

综述

PPP: Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing {Paper} {Blog} 🌟🌟🌟

9 图神经网络 `Graph Neural Network`

综述

Graph Neural Networks for Natural Language Processing: A Survey {Paper} 🌟🌟

10 句嵌入 `Sentence Embedding`

核心

InferSent: Supervised Learning of Universal Sentence Representations from Natural Language Inference Data {Paper} {Code} 🌟🌟
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks {Paper} {Code} 🌟🌟🌟
BERT-flow: On the Sentence Embeddings from Pre-trained Language Models {Paper} {Code} {Blog} 🌟🌟
SimCSE: Simple Contrastive Learning of Sentence Embeddings {Paper} {Code} 🌟🌟🌟

Ailln / nlp-roadmap

Natural Language Processing Roadmap

1 分词 `Word Segmentation`

综述

2 词嵌入 `Word Embedding`

综述

核心

其他

3 文本分类 `Text Classification`

综述

CNN

4 序列标注 `Sequence Labeling`

综述

Bi-LSTM + CRF

其他

5 对话系统 `Dialogue Systems`

综述

Open Domain Dialogue Systems

Task Oriented Dialogue Systems

Conversational Response Selection

6 主题模型 `Topic Model`

LDA

7 知识图谱 `Knowledge Graph`

综述

8 提示学习 `Prompt Learning`

综述

9 图神经网络 `Graph Neural Network`

综述

10 句嵌入 `Sentence Embedding`

核心

参考

About

Natural Language Processing Roadmap

1 分词 Word Segmentation

综述

2 词嵌入 Word Embedding

综述

核心

其他

3 文本分类 Text Classification

综述

CNN

4 序列标注 Sequence Labeling

综述

Bi-LSTM + CRF

其他

5 对话系统 Dialogue Systems

综述

Open Domain Dialogue Systems

Task Oriented Dialogue Systems

Conversational Response Selection

6 主题模型 Topic Model

LDA

7 知识图谱 Knowledge Graph

综述

8 提示学习 Prompt Learning

综述

9 图神经网络 Graph Neural Network

综述

10 句嵌入 Sentence Embedding

核心

参考

About

1 分词 `Word Segmentation`

2 词嵌入 `Word Embedding`

3 文本分类 `Text Classification`

4 序列标注 `Sequence Labeling`

5 对话系统 `Dialogue Systems`

6 主题模型 `Topic Model`

7 知识图谱 `Knowledge Graph`

8 提示学习 `Prompt Learning`

9 图神经网络 `Graph Neural Network`

10 句嵌入 `Sentence Embedding`