- 卖萌屋学术站
- The NLP Index
- Nature Language Tech | Synced
- Natural Language Processing | Papers With Code
- 机器之心 SOTA
- 中文GLUE
- GLUE Benchmark
- Google Research
- Facebook Research
- microsoft/unilm: UniLM - Unified Language Model Pre-training / Pre-training for NLP and Beyond
- 腾讯技术工程 | 机器之心
- 美团技术团队
- pytorch
- tensorflow
- facebookresearch/pytext: A natural language modeling framework based on PyTorch
- deeplearning NLP with PyTorch
- Text classifiers, Sequence taggers, Joint intent-slot model and Contextual intent-slot models
- C++ server example
- zalandoresearch/flair: A very simple framework for state-of-the-art Natural Language Processing (NLP)
- NER, POS, sense disambiguation and classification
- on top of PyTorch
- pytorch/fairseq: Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
- Seq2Seq modeling
- on top of PyTorch
- BrikerMan/Kashgari: Kashgari is a Production-ready NLP Transfer learning framework for text-labeling and text-classification, includes Word2Vec, BERT, and GPT2 Language Embedding.
- Text labeling, classification, Pre-trained
- on top of Tensorflow
- asyml/texar: Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow
- NLP Toolkit
- on top of Tensorflow
- stanfordnlp/stanza: Official Stanford NLP Python Library for Many Human Languages
- on top of Pytorch
- speed, prodcution system use
- nltk/nltk: NLTK Source
- education and research tool
- learning and exploring NLP concepts
- sloria/TextBlob: Simple, Pythonic, text processing--Sentiment analysis, part-of-speech tagging, noun phrase extraction, translation, and more.
- on top of NLTK
- fast-prtotyping
- applications don't require highly performance
- spaCy · Industrial-strength Natural Language Processing in Python
- fast
- streamlined
- production-ready
- chartbeat-labs/textacy: NLP, before and after spaCy
- OpenNMT/Tokenizer: Fast and customizable text tokenization library with BPE and SentencePiece support
- google/sentencepiece: Unsupervised text tokenizer for Neural Network-based text generation.
- huggingface/tokenizers: 💥Fast State-of-the-Art Tokenizers optimized for Research and Production
- OpenNMT - Open-Source Neural Machine Translation
- google-research/text-to-text-transfer-transformer: Code for the paper "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"
- tensorflow/tensor2tensor: Library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research.
- pytorch/text: Data loaders and abstractions for text and NLP
- tensorflow/text: Making text a first-class citizen in TensorFlow.
- textpipe/textpipe: Textpipe: clean and extract metadata from text
2020 Chinese-Bert
CLUEbenchmark/CLUEPretrainedModels2019 GPT2+Chinese
Morizeyao/GPT2-Chinese: Chinese version of GPT2 training code, using BERT tokenizer.2019 Bert-wwm
ymcui/Chinese-BERT-wwm: Pre-Training with Whole Word Masking for Chinese BERT(中文 BERT-wwm 系列模型)2019 Toolkit
huggingface/pytorch-transformers: 👾 A library of state-of-the-art pretrained models for Natural Language Processing (NLP)2019 MASK
google-research/bert: TensorFlow code and pre-trained models for BERT2019 Permutation
zihangdai/xlnet: XLNet: Generalized Autoregressive Pretraining for Language Understanding2019 MultiTask
PaddlePaddle/ERNIE: An Implementation of ERNIE For Language Understanding2019 Attention
kimiyoung/transformer-xl2019 LM
openai/gpt-2: Code for the paper "Language Models are Unsupervised Multitask Learners"2018 TwoLMs
ELMo: Deep contextualized word representations2018 Co-occurrence
stanfordnlp/GloVe: GloVe model for distributed word representation2019
facebookresearch/fastText: Library for fast text representation and classification.2019 Word2vec
Embedding/Chinese-Word-Vectors: 100+ Chinese Word Vectors 上百种预训练中文词向量2018
Word2vec Chinese-Word-Vectors2018 LSTM
递归神经网络 | TensorFlow2013
Google Code Archive - Long-term storage for Google Code Project Hosting.
2020 Toolkit
RUCAIBox/TextBox: TextBox is an open-source library for building text generation system.2020 Awesome
tokenmill/awesome-nlg: A curated list of resources dedicated to Natural Language Generation (NLG)2018 BenchMark
geek-ai/Texygen: A text generation benchmarking platform2018 RNN
docs/text_generation.ipynb at master · tensorflow/docs2019 Tookit on top of TF
asyml/texar: Toolkit for Text Generation and Beyond
Collection
brightmart/text_classification: all kinds of text classification models and more with deep learning
2019 Framework
RasaHQ/rasa_nlu: 💬 Open source library for natural language understanding and machine learning-based dialogue management. - All things around intent classification, entity extraction and action predictions - DIY NLP and chatbot framwork.2018 Chi
crownpku/Rasa_NLU_Chi: Turn Chinese natural language into structured data 中文自然语言理解2019 Toolkit
snipsco/snips-nlu: Snips Python library to extract meaning from text
2020 Toolkit
RUCAIBox/CRSLab: CRSLab is an open-source toolkit for building Conversational Recommender System (CRS).2018
5hirish/adam_qas: ADAM - A Question Answering System. Inspired from IBM Watson
2019 Sentence
UKPLab/sentence-transformers: Sentence Embeddings with BERT & XLNet2019 Sentence
hanxiao/bert-as-service: Mapping a variable-length sentence to a fixed-length vector using BERT model2018 Sentence
explosion/sense2vec: 🦆 Use NLP to go beyond vanilla word2vec2019 Sentence
gensim: models.doc2vec – Doc2vec paragraph embeddings2014 Sentence
klb3713/sentence2vec: Tools for mapping a sentence with arbitrary length to vector space2019 Doc+Sentence+Word
gensim: Topic modelling for humans2019 MinHash
ekzhu/datasketch: MinHash, LSH, LSH Forest, Weighted MinHash, HyperLogLog, HyperLogLog++2019 LevenshteinDistance
ztane/python-Levenshtein: The Levenshtein Python C extension module contains functions for fast computation of Levenshtein distance and string similarity2018 Graph
caesar0301/graphsim: Graph similarity algorithms based on NetworkX.
2019 Pinyin
mozillazg/python-pinyin: 汉字转拼音 (pypinyin)
2020 Explain
jalammar/ecco: Visualize and explore NLP language models. Ecco creates interactive visualizations directly in Jupyter notebooks explaining the behavior of Transformer-based language models (like GPT2).2019 Word
JasonKessler/scattertext: Beautiful visualizations of how language differs among document types.2019 Bert GPT
jessevig/bertviz: Tool for visualizing attention in the Transformer model (BERT and OpenAI GPT-2)2019 MLC
marcotcr/lime: Lime: Explaining the predictions of any machine learning classifier2019 Graph Visualization Framework
antvis/G6: ♾ A Graph Visualization Framework in JavaScript2017 Neo4j D3
eisman/neo4jd3: Neo4j graph visualization using D3.js2019 Neo4j browser
neo4j-contrib/neovis.js: Neo4j + vis.js = neovis.js. Graph visualizations in the browser with data from Neo4j.2019 Neo4j 3D
jexp/neo4j-3d-force-graph: Experiments with Neo4j & 3d-force-graph https://github.com/vasturiano/3d-force-graph2019 Interactive Graphvizz
magjac/graphviz-visual-editor: A web application for interactive visual editing of Graphviz graphs described in the DOT language.2019 graphviz Python
mapio/GraphvizAnim: A tool to create animated graph visualizations, based on graphviz.
2019 Kinds of indexes
shivam5992/textstat: python package to calculate readability statistics of a text object - paragraphs, sentences, articles.2019 in Spacy
mholtzscher/spacy_readability: spaCy pipeline component for adding text readability meta data to Doc objects.
-
2019 XLM
facebookresearch/XLM: PyTorch original implementation of Cross-lingual Language Model Pretraining. -
2018 Microsoft Based on Phrase
Microsoft/NPMT: Towards Neural Phrase-based Machine Translation -
2019 Google Based on Seq2Seq and Attention
tensorflow/nmt: TensorFlow Neural Machine Translation Tutorial -
2019 Google Based on Pure Attention
models/official/transformer at master · tensorflow/models -
2019 Facebook Based on CNN
pytorch/fairseq: Facebook AI Research Sequence-to-Sequence Toolkit written in Python. -
2019 Facebook Based on Unsupervised
facebookresearch/UnsupervisedMT: Phrase-Based & Neural Unsupervised Machine Translation -
2019 DeepL Basedon CNN (Not Open Source)
DeepL Translator DeepL 基于 CNN 的翻译工具 -
2019 OpenNMT
OpenNMT/OpenNMT: Open Source Neural Machine Translation
- datasets
- 中文任务基准测评
- 中文预训练语料
- cluebenchmarks.com/dataSet_search.html
- 离线百度百科下载(2012 图文版)
- 百度百科 2012 图文版
- 最全中华古诗词数据库
- Kinds of Resources
- 中文历时语料库
- 中文自然语言处理数据集。
- google/trax: Trax — your path to advanced deep learning
- tensorflow/models
- Transformers
- OpenNMT/OpenNMT-py
- OpenNMT/OpenNMT-tf
- microsoft/nlp-recipes: Natural Language Processing Best Practices & Examples
- Michael Collins, Michael Collins - Google Scholar Citations ☆
- Jason Eisner - Home Page (JHU), Jason Eisner - Google Scholar Citations ☆
- David Yarowsky, David Yarowsky - Google Scholar Citations
- Dan Jurafsky - Home Page, Dan Jurafsky - Google Scholar Citations ☆
- Christopher Manning, Stanford NLP, Christopher D Manning - Google Scholar Citations ☆
- Dan Klein's Home Page, The Berkeley NLP Group ☆
- Dan Roth - Main Page, Dan Roth - Google Scholar Citations ☆
- ChengXiang Zhai - Home Page, ChengXiang Zhai - Google Scholar Citations
- Eugene Charniak's Home Page, Eugene Charniak - Google Scholar Citations
- Joakim Nivre's Home Page, Joakim Nivre - Google Scholar Citations ☆
- Philipp Koehn, Philipp Koehn - Google Scholar Citations
- James H. Martin, James H. Martin - Google Scholar Citations
- Julia Hirschberg, Julia Hirschberg - Google Scholar Citations
- Fernando Pereira – Google AI, Fernando Pereira - Google Scholar Citations ☆
- ryan mcdonald, Ryan McDonald - Google Scholar Citations
- Slav Petrov - Слав Петров, Slav Petrov - Google Scholar Citations ☆
- Kenneth Church HomePage, Kenneth Ward Church - Google Scholar Citations