Tom pei's starred repositories

transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Language:PythonLicense:Apache-2.0Stargazers:124897Issues:1092Issues:14605

faiss

A library for efficient similarity search and clustering of dense vectors.

DB-GPT

AI Native Data App Development framework with AWEL(Agentic Workflow Expression Language) and Agents

Language:PythonLicense:MITStargazers:10957Issues:92Issues:764

continue

⏩ Open-source VS Code and JetBrains extensions that enable you to easily create your own modular AI software development system

Language:TypeScriptLicense:Apache-2.0Stargazers:10615Issues:58Issues:830

starcoder

Home of StarCoder: fine-tuning & inference!

Language:PythonLicense:Apache-2.0Stargazers:7104Issues:70Issues:140

flax

Flax is a neural network library for JAX that is designed for flexibility.

Language:PythonLicense:Apache-2.0Stargazers:5510Issues:82Issues:827

BigDL

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, etc.) on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max). A PyTorch LLM library that seamlessly integrates with HuggingFace, LangChain, LlamaIndex, DeepSpeed, vLLM, FastChat, ModelScope, etc.

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:4866Issues:240Issues:2134

ToolGood.Words

一款高性能敏感词(非法词/脏字)检测过滤组件,附带繁体简体互换,支持全角半角互换,汉字转拼音,模糊搜索等功能。

Language:JavaScriptLicense:Apache-2.0Stargazers:4488Issues:103Issues:98

nvitop

An interactive NVIDIA-GPU process viewer and beyond, the one-stop solution for GPU process management.

Language:PythonLicense:Apache-2.0Stargazers:3947Issues:21Issues:76

mergekit

Tools for merging pretrained large language models.

Language:PythonLicense:LGPL-3.0Stargazers:1964Issues:24Issues:109

magicoder

Magicoder: Source Code Is All You Need

Language:PythonLicense:MITStargazers:1865Issues:26Issues:35

codeshell

A series of code large language models developed by PKU-KCL

Language:PythonLicense:NOASSERTIONStargazers:1555Issues:22Issues:73

Project_CodeNet

This repository is to support contributions for tools for the Project CodeNet dataset hosted in DAX

Language:PythonLicense:Apache-2.0Stargazers:1485Issues:54Issues:32
Language:PythonLicense:Apache-2.0Stargazers:1372Issues:21Issues:19

Megatron-DeepSpeed

Ongoing research training transformer language models at scale, including: BERT & GPT-2

Language:PythonLicense:NOASSERTIONStargazers:1239Issues:24Issues:142

sample-app-aoai-chatGPT

Sample code for a simple web chat experience through Azure OpenAI, including Azure OpenAI On Your Data.

Language:PythonLicense:MITStargazers:1207Issues:40Issues:295

DeepSeek-LLM

DeepSeek LLM: Let there be answers

Language:MakefileLicense:MITStargazers:1119Issues:19Issues:31
Language:PythonLicense:Apache-2.0Stargazers:848Issues:22Issues:13
Language:PythonLicense:Apache-2.0Stargazers:765Issues:12Issues:34

MFTCoder

High Accuracy and efficiency multi-task fine-tuning framework for Code LLMs

Language:PythonLicense:NOASSERTIONStargazers:564Issues:8Issues:40

CSGHub

CSGHub is an opensource large model assets platform just like on-premise huggingface which helps to manage datasets, model files, codes and more. CSGHub是一个开源、可信的大模型资产管理平台,可帮助用户治理LLM和LLM应用生命周期中涉及到的资产(数据集、模型文件、代码等)。CSGHub提供类似私有化的Huggingface功能,以类似OpenStack Glance管理虚拟机镜像、Harbor管理容器镜像以及Sonatype Nexus管理制品的方式,实现对LLM资产的管理。欢迎关注反馈和Star⭐️

Language:VueLicense:Apache-2.0Stargazers:397Issues:11Issues:7

llm-autoeval

Automatically evaluate your LLMs in Google Colab

Language:PythonLicense:MITStargazers:377Issues:7Issues:14

CodeFuse-Query

Query-Based Code Analysis Engine

Language:GoLicense:Apache-2.0Stargazers:145Issues:11Issues:9

evol-teacher

Open Source WizardCoder Dataset

Language:PythonLicense:Apache-2.0Stargazers:123Issues:2Issues:4

codefuse-evaluation

Industrial-level evaluation benchmarks for Coding LLMs in the full life-cycle of AI native software developing.企业级代码大模型评测体系,持续开放中

Language:PythonLicense:NOASSERTIONStargazers:57Issues:3Issues:0

crystalcoder-data-prep

Data preparation code for CrystalCoder 7B LLM

Language:PythonStargazers:38Issues:0Issues:0

csghub-server

CSGHub Server is the backend server for CSGHub which helps user to manage datasets, model files, codes and more. CSGHub Server是开源大模型资产管理平台CSGHub的服务端部分的开源项目,提供基于REST API的模型和数据集等大模型资产管理功能。欢迎关注反馈和Star⭐️

Language:GoLicense:Apache-2.0Stargazers:20Issues:0Issues:0

NanoPhi

code for NanoPhi

Language:PythonLicense:MITStargazers:1Issues:0Issues:0