There are 9 repositories under nlp-resources topic.
A collection of corpora for named entity recognition (NER) and entity recognition tasks. These annotated datasets cover a variety of languages, domains and entity types.
This is a continuously updated handbook for readers to easily track the latest Text-to-SQL techniques in the literature and provide practical guidance for researchers and practitioners.
Portuguese pre-trained BERT models
The hands-on NLTK tutorial for NLP in Python
A curated list of Open Information Extraction (OIE) resources: papers, code, data, etc.
chinese NLP corpus of chinese science fiction,chinese science fiction corpus : About 4675 Chinese science fiction novels 大约有4675本科幻小说,中文科幻小说自然语言处理语料库,中文科幻小说文本语料库,中文科幻小说文本数据库,科幻小说语料
Projects and useful articles / links
A curated list of beginner resources in Natural Language Processing
This repository contains code and datasets related to entity/knowledge papers from the VERT (Versatile Entity Recognition & disambiguation Toolkit) project, by the Knowledge Computing group at Microsoft Research Asia (MSRA).
Resource NLP & Bahasa
A lexicon for Sudachi
A curated list of NLP resources for Hungarian
TriggerNER: Learning with Entity Triggers as Explanations for Named Entity Recognition (ACL 2020)
chinese NLP corpus of chinese science fiction, chinese science fiction corpus: Archive of the Ark Plan of Ula Science Fiction Website 乌拉科幻小说网方舟计划存档,中文科幻小说自然语言处理语料库,中文科幻小说文本语料库,中文科幻小说文本数据库,科幻小说语料
Natural Language Processing (NLP). Covering topics such as Tokenization, Part Of Speech tagging (POS), Machine translation, Named Entity Recognition (NER), Classification, and Sentiment analysis.
A modular annotation system that supports complex, interactive annotation graphs embedded on top of sequences of text.
A Python module that fetches a page of a word/phrase from the Online Indonesian Dictionary (https://kbbi.kemdikbud.go.id).
Linguistic Datasets for Portuguese: Lista de conjuntos de dados linguísticos para língua portuguesa com licença flexíveis: banco de dados, lista de palavras, sinônimos, antônimos, dicionário temático, tesauro, linked data, semântica, ontologia e representação de conhecimento
Resources to go with the Indic NLP Library
A list of Romanian NLP Datasets
Python library for feature selection for text features. It has filter method, genetic algorithm and TextFeatureSelectionEnsemble for improving text classification models. Helps improve your machine learning models
Natural Language Processing Courses with Resources
Assignment solutions for CS224N: Natural Language Processing with Deep Learning - Stanford / Winter 2023
A python package for removing duplicate text in clinical notes or other documents
Natural Language Procesing
A curated list of resources dedicated to Knowledge Distillation, Recommendation System, especially Natural Language Processing (NLP).
Dive into the world of Arabic NLP with this extensive collection of resources, tools, datasets, and best practices tailored for the Arabic language.
Scripts for preprocessing the CoNLL-2005 SRL dataset.