There are 20 repositories under corpus topic.
大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP
A collection of small corpuses of interesting data for the creation of bots and similar stuff.
搜索所有中文NLP数据集,附常用英文NLP数据集
中文语言理解测评基准 Chinese Language Understanding Evaluation Benchmark: datasets, baselines, pre-trained models, corpus and leaderboard
中文人名语料库。人名生成器。中文姓名,姓氏,名字,称呼,日本人名,翻译人名,英文人名。可用于中文分词、人名实体识别。
Deep Learning and deep reinforcement learning research papers and some codes
Final Weibo Crawler Scrap Anything From Weibo, comments, weibo contents, followers, anything. The Terminator
Awesome Chatbot Projects,Corpus,Papers,Tutorials.Chinese Chatbot =>:
用于训练中英文对话系统的语料库 Datasets for Training Chatbot System
非常全的文言文(古文)-现代文平行语料
公司名语料库。机构名语料库。公司简称,缩写,品牌词,企业名。可用于中文分词、机构名实体识别。
:helicopter: 保险行业语料库,聊天机器人
Large-scale Pre-training Corpus for Chinese 100G 中文预训练语料
ChatGPT 中文语料库 对话语料 小说语料 客服语料 用于训练大模型
Chatbot in 200 lines of code using TensorLayer
高质量中文预训练模型集合:最先进大模型、最快小模型、相似度专门模型
中文医疗信息处理基准CBLUE: A Chinese Biomedical Language Understanding Evaluation Benchmark
Curated List of Persian Natural Language Processing and Information Retrieval Tools and Resources
An Integrated Corpus Tool With Multilingual Support for the Study of Language, Literature, and Translation
❤️Emotional First Aid Dataset, 心理咨询问答、聊天机器人语料库
微信公众号语料库
A Curated List of Dataset and Usable Library Resources for NLP in Bahasa Indonesia
chinese NLP corpus of chinese science fiction,chinese science fiction corpus : About 4675 Chinese science fiction novels 大约有4675本科幻小说,中文科幻小说自然语言处理语料库,中文科幻小说文本语料库,中文科幻小说文本数据库,科幻小说语料
A dataset of millions of news articles scraped from a curated list of data sources.
We gather Malaysian dataset! https://malaysian-dataset.readthedocs.io/
Cornell NLVR and NLVR2 are natural language grounding datasets. Each example shows a visual input and a sentence describing it, and is annotated with the truth-value of the sentence.