youngfire's starred repositories

ngram_train

python 实现ngram 语言模型的训练,根据模型可计算句子的困惑度、得分等

Language:PythonStargazers:7Issues:0Issues:0

llm_corpus_quality

大模型预训练中文语料清洗及质量评估 Large model pre-training corpus cleaning

Language:JavaStargazers:18Issues:0Issues:0

YAYI

雅意大模型:为客户打造安全可靠的专属大模型,基于大规模中英文多领域指令数据训练的 LlaMA 2 & BLOOM 系列模型,由中科闻歌算法团队研发。(Repo for YaYi Chinese LLMs based on LlaMA2 & BLOOM)

Language:PythonLicense:Apache-2.0Stargazers:3245Issues:0Issues:0

china_area_mysql

**5级行政区域mysql库

Stargazers:1Issues:0Issues:0

KeywordProcesser

使用python实现了一个简单的trie树结构,可增加/查找/删除关键词,用于中文文本的关键词匹配、停用词删除等。

Language:PythonStargazers:65Issues:0Issues:0

awesome-english-ebooks

经济学人(含音频)、纽约客、卫报、连线、大西洋月刊等英语杂志免费下载,支持epub、mobi、pdf格式, 每周更新

Language:CSSStargazers:20085Issues:0Issues:0

py-trie

Python library which implements the Ethereum Trie structure.

Language:PythonLicense:MITStargazers:104Issues:0Issues:0

chinese-xinhua

:orange_book: 中华新华字典数据库。包括歇后语,成语,词语,汉字。

Language:PythonLicense:MITStargazers:10787Issues:0Issues:0

alpaca-chinese-dataset

Alpaca Chinese Dataset -- 中文指令微调数据集【持续更新】

Language:PythonStargazers:130Issues:0Issues:0

attu

The GUI for Milvus

Language:TypeScriptLicense:Apache-2.0Stargazers:1080Issues:0Issues:0

EasySpider

A visual no-code/code-free web crawler/spider易采集:一个可视化浏览器自动化测试/数据采集/爬虫软件,可以无代码图形化的设计和执行爬虫任务。别名:ServiceWrapper面向Web应用的智能化服务封装系统。

Language:JavaScriptLicense:NOASSERTIONStargazers:30627Issues:0Issues:0

GLM-4

GLM-4 series: Open Multilingual Multimodal Chat LMs | 开源多语言多模态对话模型

Language:PythonLicense:Apache-2.0Stargazers:3911Issues:0Issues:0

InternLM

Official release of InternLM2.5 7B base and chat models. 1M context support

Language:PythonLicense:Apache-2.0Stargazers:5884Issues:0Issues:0

xtuner

An efficient, flexible and full-featured toolkit for fine-tuning LLM (InternLM2, Llama3, Phi3, Qwen, Mistral, ...)

Language:PythonLicense:Apache-2.0Stargazers:3449Issues:0Issues:0

FinQwen

FinQwen: 致力于构建一个开放、稳定、高质量的金融大模型项目,基于大模型搭建金融场景智能问答系统,利用开源开放来促进「AI+金融」。

Language:Jupyter NotebookStargazers:208Issues:0Issues:0

Legal-Eagle-InternLM

Legal-Eagle-InternLM 是一个基于商汤科技和上海人工智能实验室推出的书生浦语大模型InternLM的法律问答机器人。旨在为用户提供符合3H(即Helpful、Honest、Harmless)原则的专业、智能、全面的法律服务的法律领域大模型。

Language:PythonLicense:Apache-2.0Stargazers:30Issues:0Issues:0

Linly

Chinese-LLaMA 1&2、Chinese-Falcon 基础模型;ChatFlow中文对话模型;中文OpenLLaMA模型;NLP预训练/指令微调数据集

Language:PythonStargazers:3016Issues:0Issues:0

CSL

[COLING 2022] CSL: A Large-scale Chinese Scientific Literature Dataset 中文科学文献数据集

Language:PythonStargazers:548Issues:0Issues:0

ChatTTS

A generative speech model for daily dialogue.

Language:PythonLicense:AGPL-3.0Stargazers:28253Issues:0Issues:0

S-Eval

S-Eval: Automatic and Adaptive Test Generation for Benchmarking Safety Evaluation of Large Language Models

License:NOASSERTIONStargazers:26Issues:0Issues:0
Stargazers:25Issues:0Issues:0

nb_http_client

pip install nb_http_client ,nb_http_client 是 python 史上性能最强的http客户端,比任意请求包快很多倍

Language:PythonStargazers:33Issues:0Issues:0

ChatGPT_DAN

ChatGPT DAN, Jailbreaks prompt

Stargazers:6174Issues:0Issues:0

error_text_gen

用于生成文本纠错模型(如Gector)需要的大量数据。

Language:PythonLicense:MITStargazers:14Issues:0Issues:0

paper_checking_system

基于C#和C++开发的文本查重/论文查重系统,一亿字次级论文库秒级查重。关联:查重算法、数据去重、文档查重、文本去重、标书查重、辅助防串标、作业查重、duplicate check

Language:C#License:GPL-2.0Stargazers:398Issues:0Issues:0

LERT

LERT: A Linguistically-motivated Pre-trained Language Model(语言学信息增强的预训练模型LERT)

Language:PythonLicense:Apache-2.0Stargazers:190Issues:0Issues:0

ChineseTextClassification

自然语言处理之中文文本分类(以垃圾短信识别为例)

Language:PythonStargazers:16Issues:0Issues:0

Chinese-text-correction-papers

text correction papers

Stargazers:281Issues:0Issues:0

MediaCrawler

小红书笔记 | 评论爬虫、抖音视频 | 评论爬虫、快手视频 | 评论爬虫、B 站视频 | 评论爬虫、微博帖子 | 评论爬虫

Language:PythonLicense:NOASSERTIONStargazers:15423Issues:0Issues:0