Gege Sun's repositories

HanLP

中文分词 词性标注 命名实体识别 依存句法分析 成分句法分析 语义依存分析 语义角色标注 指代消解 风格转换 语义相似度 新词发现 关键词短语提取 自动摘要 文本分类聚类 拼音简繁转换 自然语言处理

Language:PythonLicense:Apache-2.0Stargazers:1Issues:0Issues:0

TextBrewer

A PyTorch-based knowledge distillation toolkit for natural language processing

Language:PythonLicense:Apache-2.0Stargazers:1Issues:0Issues:0

-

搜索所有中文NLP数据集,附常用英文NLP数据集

Language:PythonStargazers:0Issues:0Issues:0

AISystem

AISystem 主要是指AI系统,包括AI芯片、AI编译器、AI推理和训练框架等AI全栈底层技术

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:0Issues:0Issues:0

awesome-chatgpt-prompts

This repo includes ChatGPT prompt curation to use ChatGPT better.

Language:HTMLLicense:CC0-1.0Stargazers:0Issues:0Issues:0

Awesome-Chinese-NLP

A curated list of resources for Chinese NLP 中文自然语言处理相关资料

License:Apache-2.0Stargazers:0Issues:0Issues:0

Awesome-LLM

Awesome-LLM: a curated list of Large Language Model

License:CC0-1.0Stargazers:0Issues:0Issues:0

ChatGPT-Next-Web

A cross-platform ChatGPT/Gemini UI (Web / PWA / Linux / Win / MacOS). 一键拥有你自己的跨平台 ChatGPT/Gemini 应用。

Language:TypeScriptLicense:MITStargazers:0Issues:0Issues:0

COLDataset

The official repository of the paper: COLD: A Benchmark for Chinese Offensive Language Detection

License:Apache-2.0Stargazers:0Issues:0Issues:0

competition-baseline

数据挖掘、计算机视觉、自然语言处理、推荐系统竞赛知识、代码、思路

Language:Jupyter NotebookLicense:GPL-3.0Stargazers:0Issues:0Issues:0

fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

License:MITStargazers:0Issues:0Issues:0

FlagEmbedding

Retrieval and Retrieval-augmented LLMs

License:MITStargazers:0Issues:0Issues:0

HarvestText

文本挖掘和预处理工具(文本清洗、新词发现、情感分析、实体识别链接、关键词抽取、知识抽取、句法分析等),无监督或弱监督方法

License:MITStargazers:0Issues:0Issues:0

HFL-Anthology

Collections of resources from Joint Laboratory of HIT and iFLYTEK Research (HFL)

License:CC-BY-SA-4.0Stargazers:0Issues:0Issues:0

JioNLP

中文 NLP 预处理、解析工具包,准确、高效、易用 A Chinese NLP Preprocessing & Parsing Package www.jionlp.com

License:Apache-2.0Stargazers:0Issues:0Issues:0

LIT

The Learning Interpretability Tool: Interactively analyze ML models to understand their behavior in an extensible and framework agnostic interface.

License:Apache-2.0Stargazers:0Issues:0Issues:0

LLMSurvey

The official GitHub page for the survey paper "A Survey of Large Language Models".

Stargazers:0Issues:0Issues:0

ml-visuals

🎨 ML Visuals contains figures and templates which you can reuse and customize to improve your scientific writing.

License:MITStargazers:0Issues:0Issues:0

Mli-paper-reading

深度学习经典、新论文逐段精读

License:Apache-2.0Stargazers:0Issues:0Issues:0

MNBVC

MNBVC(Massive Never-ending BT Vast Chinese corpus)超大规模中文语料集。对标chatGPT训练的40T数据。MNBVC数据集不但包括主流文化,也包括各个小众文化甚至火星文的数据。MNBVC数据集包括新闻、作文、小说、书籍、杂志、论文、台词、帖子、wiki、古诗、歌词、商品介绍、笑话、糗事、聊天记录等一切形式的纯文本中文数据。

License:MITStargazers:0Issues:0Issues:0

NLP

中英文敏感词、语言检测、中外手机/电话归属地/运营商查询、名字推断性别、手机号抽取、身份证抽取、邮箱抽取、中日文人名库、中文缩写库、拆字词典、词汇情感值、停用词、反动词表、暴恐词表、繁简体转换、英文模拟中文发音、汪峰歌词生成器、职业名称词库、同义词库、反义词库、否定词库、汽车品牌词库、汽车零件词库、连续英文切割、各种中文词向量、公司名字大全、古诗词库、IT词库、财经词库、成语词库、地名词库、历史名人词库、诗词词库、医学词库、饮食词库、法律词库、汽车词库、动物词库、中文聊天语料、中文谣言数据、百度中文问答数据集、句子相似度匹配算法集合、bert资源、文本生成&摘要相关工具、cocoNLP信息抽取工具、国内电话号码正则匹配、清华大学XLORE:中英文跨语言百科知识图谱、清华大学人工智能技术系列报

Stargazers:0Issues:0Issues:0

nlp-tutorial

Natural Language Processing Tutorial for Deep Learning Researchers

License:MITStargazers:0Issues:0Issues:0

NLP_all_tasks

【NLP菜鸟逆袭】分享 自然语言处理(文本分类、信息抽取、知识图谱、机器翻译、问答系统、文本生成、Text-to-SQL、文本纠错、文本挖掘、知识蒸馏、模型加速、OCR、TTS、Prompt、embedding等)等 实战与经验。

Stargazers:0Issues:0Issues:0

OpenCC

Conversion between Traditional and Simplified Chinese

License:Apache-2.0Stargazers:0Issues:0Issues:0

PaddleSpeech

Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.

License:Apache-2.0Stargazers:0Issues:0Issues:0

promptsource

Toolkit for creating, sharing and using natural language prompts.

License:Apache-2.0Stargazers:0Issues:0Issues:0

pycorrector

pycorrector is a toolkit for text error correction. 文本纠错,Kenlm,ConvSeq2Seq,BERT,MacBERT,ELECTRA,ERNIE,Transformer,T5等模型实现,开箱即用。

License:Apache-2.0Stargazers:0Issues:0Issues:0

Python-

All Algorithms implemented in Python

License:MITStargazers:0Issues:0Issues:0

Sentiment_Analysis_Imdb

Using Bert/Roberta + LSTM/GRU/BiLSTM/TextCNN to do the sentiment analysis on the imdb datasets.

Stargazers:0Issues:0Issues:0

speech_dataset

The dataset of Speech Recognition

License:Apache-2.0Stargazers:0Issues:0Issues:0