Gege Sun's repositories
TextBrewer
A PyTorch-based knowledge distillation toolkit for natural language processing
-
搜索所有中文NLP数据集,附常用英文NLP数据集
AISystem
AISystem 主要是指AI系统,包括AI芯片、AI编译器、AI推理和训练框架等AI全栈底层技术
awesome-chatgpt-prompts
This repo includes ChatGPT prompt curation to use ChatGPT better.
Awesome-Chinese-NLP
A curated list of resources for Chinese NLP 中文自然语言处理相关资料
Awesome-LLM
Awesome-LLM: a curated list of Large Language Model
ChatGPT-Next-Web
A cross-platform ChatGPT/Gemini UI (Web / PWA / Linux / Win / MacOS). 一键拥有你自己的跨平台 ChatGPT/Gemini 应用。
COLDataset
The official repository of the paper: COLD: A Benchmark for Chinese Offensive Language Detection
competition-baseline
数据挖掘、计算机视觉、自然语言处理、推荐系统竞赛知识、代码、思路
fairseq
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
FlagEmbedding
Retrieval and Retrieval-augmented LLMs
HarvestText
文本挖掘和预处理工具(文本清洗、新词发现、情感分析、实体识别链接、关键词抽取、知识抽取、句法分析等),无监督或弱监督方法
HFL-Anthology
Collections of resources from Joint Laboratory of HIT and iFLYTEK Research (HFL)
JioNLP
中文 NLP 预处理、解析工具包,准确、高效、易用 A Chinese NLP Preprocessing & Parsing Package www.jionlp.com
LIT
The Learning Interpretability Tool: Interactively analyze ML models to understand their behavior in an extensible and framework agnostic interface.
LLMSurvey
The official GitHub page for the survey paper "A Survey of Large Language Models".
ml-visuals
🎨 ML Visuals contains figures and templates which you can reuse and customize to improve your scientific writing.
Mli-paper-reading
深度学习经典、新论文逐段精读
MNBVC
MNBVC(Massive Never-ending BT Vast Chinese corpus)超大规模中文语料集。对标chatGPT训练的40T数据。MNBVC数据集不但包括主流文化,也包括各个小众文化甚至火星文的数据。MNBVC数据集包括新闻、作文、小说、书籍、杂志、论文、台词、帖子、wiki、古诗、歌词、商品介绍、笑话、糗事、聊天记录等一切形式的纯文本中文数据。
NLP
中英文敏感词、语言检测、中外手机/电话归属地/运营商查询、名字推断性别、手机号抽取、身份证抽取、邮箱抽取、中日文人名库、中文缩写库、拆字词典、词汇情感值、停用词、反动词表、暴恐词表、繁简体转换、英文模拟中文发音、汪峰歌词生成器、职业名称词库、同义词库、反义词库、否定词库、汽车品牌词库、汽车零件词库、连续英文切割、各种中文词向量、公司名字大全、古诗词库、IT词库、财经词库、成语词库、地名词库、历史名人词库、诗词词库、医学词库、饮食词库、法律词库、汽车词库、动物词库、中文聊天语料、中文谣言数据、百度中文问答数据集、句子相似度匹配算法集合、bert资源、文本生成&摘要相关工具、cocoNLP信息抽取工具、国内电话号码正则匹配、清华大学XLORE:中英文跨语言百科知识图谱、清华大学人工智能技术系列报
nlp-tutorial
Natural Language Processing Tutorial for Deep Learning Researchers
NLP_all_tasks
【NLP菜鸟逆袭】分享 自然语言处理(文本分类、信息抽取、知识图谱、机器翻译、问答系统、文本生成、Text-to-SQL、文本纠错、文本挖掘、知识蒸馏、模型加速、OCR、TTS、Prompt、embedding等)等 实战与经验。
OpenCC
Conversion between Traditional and Simplified Chinese
PaddleSpeech
Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.
promptsource
Toolkit for creating, sharing and using natural language prompts.
pycorrector
pycorrector is a toolkit for text error correction. 文本纠错,Kenlm,ConvSeq2Seq,BERT,MacBERT,ELECTRA,ERNIE,Transformer,T5等模型实现,开箱即用。
Python-
All Algorithms implemented in Python
Sentiment_Analysis_Imdb
Using Bert/Roberta + LSTM/GRU/BiLSTM/TextCNN to do the sentiment analysis on the imdb datasets.
speech_dataset
The dataset of Speech Recognition