Juan Diego Rodriguez's repositories
entity-recognition-datasets
A collection of corpora for named entity recognition (NER) and entity recognition tasks. These annotated datasets cover a variety of languages, domains and entity types.
EMNLP-2020
Selections from EMNLP 2020
EMNLP-2019
Notes on selected papers from EMNLP 2019
naacl-2019-notes
Notes on NAACL 2019
asset
A Dataset for Tuning and Evaluation of Sentence Simplification Models with Multiple Rewriting Transformations
github-markdown-toc
Easy TOC creation for GitHub README.md
Linguistic_and_Stylistic_Complexity
Linguistic and stylistic complexity measures for (literary) texts
NL-Augmenter
NL-Augmenter 🦎 → 🐍 A Collaborative Repository of Natural Language Transformations
simple_stories_generate2
Dataset Generation Code for SimpleStories
tiny_tokenizer
A word-level tokenizer for TinyStories data
transformers
🤗Transformers: State-of-the-art Natural Language Processing for Pytorch and TensorFlow 2.0.