Gege Sun's repositories
AdvancedLiterateMachinery
A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.
awesome-chatgpt-prompts
This repo includes ChatGPT prompt curation to use ChatGPT better.
FlagEmbedding
Retrieval and Retrieval-augmented LLMs
AISystem
AISystem 主要是指AI系统,包括AI芯片、AI编译器、AI推理和训练框架等AI全栈底层技术
LLMSurvey
The official GitHub page for the survey paper "A Survey of Large Language Models".
ChatGPT-Next-Web
A cross-platform ChatGPT/Gemini UI (Web / PWA / Linux / Win / MacOS). 一键拥有你自己的跨平台 ChatGPT/Gemini 应用。
KnowledgeGraphCourse
东南大学《知识图谱》研究生课程
Python-
All Algorithms implemented in Python
JioNLP
中文 NLP 预处理、解析工具包,准确、高效、易用 A Chinese NLP Preprocessing & Parsing Package www.jionlp.com
NLP_all_tasks
【NLP菜鸟逆袭】分享 自然语言处理(文本分类、信息抽取、知识图谱、机器翻译、问答系统、文本生成、Text-to-SQL、文本纠错、文本挖掘、知识蒸馏、模型加速、OCR、TTS、Prompt、embedding等)等 实战与经验。
promptsource
Toolkit for creating, sharing and using natural language prompts.
PaddleSpeech
Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.
MNBVC
MNBVC(Massive Never-ending BT Vast Chinese corpus)超大规模中文语料集。对标chatGPT训练的40T数据。MNBVC数据集不但包括主流文化,也包括各个小众文化甚至火星文的数据。MNBVC数据集包括新闻、作文、小说、书籍、杂志、论文、台词、帖子、wiki、古诗、歌词、商品介绍、笑话、糗事、聊天记录等一切形式的纯文本中文数据。
fairseq
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
competition-baseline
数据挖掘、计算机视觉、自然语言处理、推荐系统竞赛知识、代码、思路
Awesome-Chinese-NLP
A curated list of resources for Chinese NLP 中文自然语言处理相关资料
pycorrector
pycorrector is a toolkit for text error correction. 文本纠错,Kenlm,ConvSeq2Seq,BERT,MacBERT,ELECTRA,ERNIE,Transformer,T5等模型实现,开箱即用。
LIT
The Learning Interpretability Tool: Interactively analyze ML models to understand their behavior in an extensible and framework agnostic interface.
OpenCC
Conversion between Traditional and Simplified Chinese
Sentiment_Analysis_Imdb
Using Bert/Roberta + LSTM/GRU/BiLSTM/TextCNN to do the sentiment analysis on the imdb datasets.
Awesome-LLM
Awesome-LLM: a curated list of Large Language Model
TextBrewer
A PyTorch-based knowledge distillation toolkit for natural language processing
nlp-tutorial
Natural Language Processing Tutorial for Deep Learning Researchers
Mli-paper-reading
深度学习经典、新论文逐段精读
COLDataset
The official repository of the paper: COLD: A Benchmark for Chinese Offensive Language Detection
HFL-Anthology
Collections of resources from Joint Laboratory of HIT and iFLYTEK Research (HFL)
speech_dataset
The dataset of Speech Recognition
ml-visuals
🎨 ML Visuals contains figures and templates which you can reuse and customize to improve your scientific writing.
NLP
中英文敏感词、语言检测、中外手机/电话归属地/运营商查询、名字推断性别、手机号抽取、身份证抽取、邮箱抽取、中日文人名库、中文缩写库、拆字词典、词汇情感值、停用词、反动词表、暴恐词表、繁简体转换、英文模拟中文发音、汪峰歌词生成器、职业名称词库、同义词库、反义词库、否定词库、汽车品牌词库、汽车零件词库、连续英文切割、各种中文词向量、公司名字大全、古诗词库、IT词库、财经词库、成语词库、地名词库、历史名人词库、诗词词库、医学词库、饮食词库、法律词库、汽车词库、动物词库、中文聊天语料、中文谣言数据、百度中文问答数据集、句子相似度匹配算法集合、bert资源、文本生成&摘要相关工具、cocoNLP信息抽取工具、国内电话号码正则匹配、清华大学XLORE:中英文跨语言百科知识图谱、清华大学人工智能技术系列报