Ming Xu (徐明) (shibing624)

shibing624

Geek Repo

Company:@tencent

Location:Beijing, China

Home Page:https://blog.csdn.net/mingzai624

Github PK Tool:Github PK Tool


Organizations
NLPchina

Ming Xu (徐明)'s repositories

pycorrector

pycorrector is a toolkit for text error correction. 文本纠错,实现了Kenlm,T5,MacBERT,ChatGLM3,LLaMA等模型应用在纠错场景,开箱即用。

Language:PythonLicense:Apache-2.0Stargazers:5410Issues:85Issues:459

text2vec

text2vec, text to vector. 文本向量表征工具,把文本转化为向量矩阵,实现了Word2Vec、RankBM25、Sentence-BERT、CoSENT等文本表征、文本相似度计算模型,开箱即用。

Language:PythonLicense:Apache-2.0Stargazers:4349Issues:30Issues:146

MedicalGPT

MedicalGPT: Training Your Own Medical GPT Model with ChatGPT Training Pipeline. 训练医疗大模型,实现了包括增量预训练(PT)、有监督微调(SFT)、RLHF、DPO、ORPO。

Language:PythonLicense:Apache-2.0Stargazers:3107Issues:35Issues:372

similarity

similarity: Text similarity calculation Toolkit for Java. 文本相似度计算工具包,java编写,可用于文本相似度计算、情感分析等任务,开箱即用。

Language:JavaLicense:Apache-2.0Stargazers:1390Issues:40Issues:39

textgen

TextGen: Implementation of Text Generation models, include LLaMA, BLOOM, GPT2, BART, T5, SongNet and so on. 文本生成模型,实现了包括LLaMA,ChatGLM,BLOOM,GPT2,Seq2Seq,BART,T5,UDA等模型的训练和预测,开箱即用。

Language:PythonLicense:Apache-2.0Stargazers:918Issues:11Issues:52

similarities

Similarities: a toolkit for similarity calculation and semantic search. 相似度计算、匹配搜索工具包,支持亿级数据文搜文、文搜图、图搜图,python3开发,开箱即用。

Language:PythonLicense:Apache-2.0Stargazers:715Issues:9Issues:35

ChatPDF

RAG for Local LLM, chat with PDF/doc/txt files, ChatPDF

Language:PythonLicense:Apache-2.0Stargazers:542Issues:6Issues:28

pytextclassifier

pytextclassifier is a toolkit for text classification. 文本分类,LR,Xgboost,TextCNN,FastText,TextRNN,BERT等分类模型实现,开箱即用。

Language:PythonLicense:Apache-2.0Stargazers:478Issues:11Issues:16

parrots

Automatic Speech Recognition(ASR), Text-To-Speech(TTS) engine. 中英语音识别、多角色语音合成,支持多语言,准确率高

Language:PythonLicense:Apache-2.0Stargazers:456Issues:12Issues:27

ChatPilot

ChatPilot: Chat Agent Web UI,实现Chat对话前端,支持Google搜索、文件网址对话(RAG)、代码解释器功能,复现了Kimi Chat(文件,拖进来;网址,发出来)。

Language:SvelteLicense:Apache-2.0Stargazers:443Issues:4Issues:13

dialogbot

dialogbot, provide search-based dialogue, task-based dialogue and generative dialogue model. 对话机器人,基于问答型对话、任务型对话、聊天型对话等模型实现,支持网络检索问答,领域知识问答,任务引导问答,闲聊问答,开箱即用。

Language:PythonLicense:Apache-2.0Stargazers:323Issues:6Issues:7

addressparser

中文地址提取工具,支持**三级区划地址(省、市、区)提取和映射,支持地址热力图绘制。

Language:PythonLicense:MITStargazers:196Issues:4Issues:1

pke_zh

pke_zh, python keyphrase extraction for chinese(zh). 中文关键词或关键句提取工具,实现了KeyBert、PositionRank、TopicRank、TextRank等算法,开箱即用。

Language:PythonLicense:Apache-2.0Stargazers:169Issues:4Issues:7

nerpy

🌈 NERpy: Implementation of Named Entity Recognition using Python. 命名实体识别工具,支持BertSoftmax、BertSpan等模型,开箱即用。

Language:PythonLicense:Apache-2.0Stargazers:109Issues:4Issues:8

pysenti

Chinese Sentiment Classification Tool. 情感极性分类,基于知网、清华、BosonNLP情感词典,易扩展,基准方法,开箱即用。

Language:PythonLicense:Apache-2.0Stargazers:83Issues:4Issues:2

chatgpt-webui

ChatGPT WebUI using gradio. 给 LLM 对话和检索知识问答RAG提供一个简单好用的Web UI界面

Language:PythonLicense:Apache-2.0Stargazers:79Issues:3Issues:1

CodeAssist

CodeAssist is an advanced code completion tool that provides high-quality code completions for Python, Java, C++ and so on. CodeAssist 是一个高级代码补全工具,高质量为 Python、Java 和 C++ 补全代码。

Language:PythonLicense:Apache-2.0Stargazers:54Issues:3Issues:4

agentica

Agentica: Build Multi-Agent Workflow with 10 lines code.

Language:PythonLicense:Apache-2.0Stargazers:44Issues:4Issues:3

SmartSearch

SmartSearch: Building a quick conversation-based search engine with LLMs.

Language:PythonLicense:Apache-2.0Stargazers:42Issues:1Issues:0

github-hot

Tracking the hot Github repos and update daily 每天自动追踪Github热门项目

Language:PythonLicense:Apache-2.0Stargazers:38Issues:7Issues:1

pinyin-tokenizer

pinyintokenizer, 拼音分词器,将连续的拼音切分为单字拼音列表。

Language:PythonLicense:Apache-2.0Stargazers:23Issues:2Issues:1

zh-normalization

Chinese(zh) sentence NSW(Non-Standard-Word) Normalization

Language:PythonLicense:Apache-2.0Stargazers:8Issues:2Issues:0

ChuanhuChatGPT

GUI for ChatGPT API and many LLMs. Supports agents, file-based QA, GPT finetuning and query with web search. All with a neat UI.

Language:PythonLicense:GPL-3.0Stargazers:6Issues:1Issues:0

ChatGPT-Next-Web

A cross-platform ChatGPT/Gemini UI (Web / PWA / Linux / Win / MacOS). 一键拥有你自己的跨平台 ChatGPT/Gemini 应用。

Language:TypeScriptLicense:MITStargazers:4Issues:1Issues:0

Diffusion-Tuning

Diffusion-Tuning: Training Your Own Diffusion model with custom dataset.

Language:PythonLicense:Apache-2.0Stargazers:2Issues:3Issues:0

tools

tools

Language:JavaScriptLicense:Apache-2.0Stargazers:2Issues:2Issues:0

FastChat

An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.

Language:PythonLicense:Apache-2.0Stargazers:0Issues:1Issues:0

PaddleOCR

Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)

Language:PythonLicense:Apache-2.0Stargazers:0Issues:1Issues:0

transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Language:PythonLicense:Apache-2.0Stargazers:0Issues:1Issues:0