There are 9 repositories under nlp-resources topic.
A collection of corpora for named entity recognition (NER) and entity recognition tasks. These annotated datasets cover a variety of languages, domains and entity types.
Portuguese pre-trained BERT models
The hands-on NLTK tutorial for NLP in Python
A curated list of Open Information Extraction (OIE) resources: papers, code, data, etc.
A curated list of beginner resources in Natural Language Processing
Projects and useful articles / links
chinese NLP corpus of chinese science fiction,chinese science fiction corpus : About 4675 Chinese science fiction novels 大约有4675本科幻小说,中文科幻小说自然语言处理语料库,中文科幻小说文本语料库,中文科幻小说文本数据库,科幻小说语料
This repository contains code and datasets related to entity/knowledge papers from the VERT (Versatile Entity Recognition & disambiguation Toolkit) project, by the Knowledge Computing group at Microsoft Research Asia (MSRA).
Resource NLP & Bahasa
A lexicon for Sudachi
A curated list of NLP resources for Hungarian
TriggerNER: Learning with Entity Triggers as Explanations for Named Entity Recognition (ACL 2020)
chinese NLP corpus of chinese science fiction, chinese science fiction corpus: Archive of the Ark Plan of Ula Science Fiction Website 乌拉科幻小说网方舟计划存档,中文科幻小说自然语言处理语料库,中文科幻小说文本语料库,中文科幻小说文本数据库,科幻小说语料
A modular annotation system that supports complex, interactive annotation graphs embedded on top of sequences of text.
A Python module that fetches a page of a word/phrase from the Online Indonesian Dictionary (https://kbbi.kemdikbud.go.id).
Resources to go with the Indic NLP Library
Linguistic Datasets for Portuguese: Lista de conjuntos de dados linguísticos para língua portuguesa com licença flexíveis: banco de dados, lista de palavras, sinônimos, antônimos, dicionário temático, tesauro, linked data, semântica, ontologia e representação de conhecimento
Python library for feature selection for text features. It has filter method, genetic algorithm and TextFeatureSelectionEnsemble for improving text classification models. Helps improve your machine learning models
A curated list of resources dedicated to Knowledge Distillation, Recommendation System, especially Natural Language Processing (NLP).
Natural Language Procesing
A python package for removing duplicate text in clinical notes or other documents
Natural Language Processing Courses with Resources
A list of Romanian NLP Datasets
Scripts for preprocessing the CoNLL-2005 SRL dataset.
📖 A curated list of resources dedicated to Natural Language Processing (NLP) in the Yoruba Language.
Kumpulan resource untuk pemrosesan bahasa alami Bahasa Indonesia. Segala bentuk kontribusi sangat "WELCOME"
A collection of natural language processing notebooks.