MrSeven77's starred repositories

NeteaseCloudMusicApi

网易云音乐 Node.js API service

Qwen

The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.

Language:PythonLicense:Apache-2.0Stargazers:12853Issues:99Issues:1031

nlp_chinese_corpus

大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP

streaming-llm

[ICLR 2024] Efficient Streaming Language Models with Attention Sinks

Language:PythonLicense:MITStargazers:6393Issues:60Issues:78

DeepSeek-Coder

DeepSeek Coder: Let the Code Write Itself

Language:PythonLicense:MITStargazers:6208Issues:68Issues:151

AgentVerse

🤖 AgentVerse 🪐 is designed to facilitate the deployment of multiple LLM-based agents in various applications, which primarily provides two frameworks: task-solving and simulation

Language:JavaScriptLicense:Apache-2.0Stargazers:3925Issues:57Issues:76

Luotuo-Chinese-LLM

骆驼(Luotuo): Open Sourced Chinese Language Models. Developed by 陈启源 @ 华中师范大学 & 李鲁鲁 @ 商汤科技 & 冷子昂 @ 商汤科技

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:3625Issues:55Issues:44

MNBVC

MNBVC(Massive Never-ending BT Vast Chinese corpus)超大规模中文语料集。对标chatGPT训练的40T数据。MNBVC数据集不但包括主流文化,也包括各个小众文化甚至火星文的数据。MNBVC数据集包括新闻、作文、小说、书籍、杂志、论文、台词、帖子、wiki、古诗、歌词、商品介绍、笑话、糗事、聊天记录等一切形式的纯文本中文数据。

MedicalGPT

MedicalGPT: Training Your Own Medical GPT Model with ChatGPT Training Pipeline. 训练医疗大模型,实现了包括增量预训练(PT)、有监督微调(SFT)、RLHF、DPO、ORPO。

Language:PythonLicense:Apache-2.0Stargazers:3033Issues:33Issues:370

modelscope-agent

ModelScope-Agent: An agent framework connecting models in ModelScope with the world

Language:PythonLicense:Apache-2.0Stargazers:2324Issues:33Issues:192

AgentBench

A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)

Language:PythonLicense:Apache-2.0Stargazers:2041Issues:29Issues:133

chinese-llm-benchmark

中文大模型能力评测榜单:目前已囊括106个大模型,覆盖chatgpt、gpt4o、百度文心一言、阿里通义千问、讯飞星火、商汤senseChat、minimax等商用模型, 以及百川、qwen2、glm4、yi、书生internLM2、llama3等开源大模型,多维度能力评测。不仅提供能力评分排行榜,也提供所有模型的原始输出结果!

Chat-Haruhi-Suzumiya

Chat凉宫春日, An open sourced Role-Playing chatbot Cheng Li, Ziang Leng, and others.

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:1723Issues:15Issues:59

LLMTest_NeedleInAHaystack

Doing simple retrieval from LLM models at various context lengths to measure accuracy

Language:Jupyter NotebookLicense:NOASSERTIONStargazers:1357Issues:12Issues:25

data-juicer

A one-stop data processing system to make data higher-quality, juicier, and more digestible for LLMs! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大语言模型提供更高质量、更丰富、更易”消化“的数据!

Language:PythonLicense:Apache-2.0Stargazers:1319Issues:12Issues:118

WebCPM

Official codes for ACL 2023 paper "WebCPM: Interactive Web Search for Chinese Long-form Question Answering"

Language:HTMLLicense:Apache-2.0Stargazers:967Issues:24Issues:26

DeepSeek-MoE

DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models

Language:PythonLicense:MITStargazers:934Issues:15Issues:35

massive

Tools and Modeling Code for the MASSIVE dataset

Language:PythonLicense:NOASSERTIONStargazers:536Issues:17Issues:24

progress-bar

📊 Flask API for SVG progress badges

Language:PythonLicense:MITStargazers:488Issues:8Issues:9

CValues

面向中文大模型价值观的评估与对齐研究

Language:PythonLicense:Apache-2.0Stargazers:446Issues:1Issues:7

BLoRA

batched loras

QQMusicSpider

基于Scrapy的QQ音乐爬虫(QQ Music Spider),爬取歌曲信息、歌词、精彩评论等,并且分享了QQ音乐中排名前6400名的内地和港台歌手的49万+的音乐语料

WebShop

[NeurIPS 2022] 🛒WebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents

Language:PythonLicense:MITStargazers:233Issues:12Issues:25

QMSum

Dataset for NAACL 2021 paper: "QMSum: A New Benchmark for Query-based Multi-domain Meeting Summarization"

Language:Jupyter NotebookLicense:MITStargazers:104Issues:12Issues:13

openapi-schemas

OpenAPI 3.0 JSON schemas. Files are automatically synced to the VTEX Developer Portal.

MediaSum

MediaSum: A Large-scale Media Interview Dataset for Dialogue Summarization

DialFact

We construct and introduce DIALFACT, a testing benchmark dataset crowd-annotated conversational claims, paired with pieces of evidence from Wikipedia.

Language:PythonLicense:BSD-3-ClauseStargazers:41Issues:6Issues:2

progressed-py

Progressbar microservice written in 🐍 Python

Language:PythonLicense:AGPL-3.0Stargazers:1Issues:3Issues:0