ChrisGao001's repositories
ChatGLM-6B
ChatGLM-6B:开源双语对话语言模型 | An Open Bilingual Dialogue Language Model
ChatGLM-MNN
Pure C++, Easy Deploy ChatGLM-6B.
euler
A distributed graph deep learning framework.
fastllm
纯c++的全平台llm加速库,支持python调用,chatglm-6B级模型单卡可达10000+token / s,支持glm, llama, moss基座,手机端流畅运行
flash_attention_inference
Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.
FlexGen
Running large language models on a single GPU for throughput-oriented scenarios.
ggml
Tensor library for machine learning
gpt-2
Code for the paper "Language Models are Unsupervised Multitask Learners"
graph-learn
An Industrial Graph Neural Network Framework
lightllm
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
LLM_Notes
LLM_Notes
lmdeploy
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
MedicalGPT
MedicalGPT: Training Your Own Medical GPT Model with ChatGPT Training Pipeline. 训练医疗大模型,实现包括二次预训练、有监督微调、奖励建模、强化学习训练。
MNN
MNN is a blazing fast, lightweight deep learning framework, battle-tested by business-critical use cases in Alibaba
nann
A flexible, high-performance framework for large-scale retrieval problems based on TensorFlow.
ncnn
ncnn is a high-performance neural network inference framework optimized for the mobile platform
Needle
An imperative deep learning framework with customized GPU and CPU backend
nnfusion
A flexible and efficient deep neural network (DNN) compiler that generates high-performance executable from a DNN model description.
onnxconverter-common
Common utilities for ONNX converters
pdfs
Technically-oriented PDF Collection (Papers, Specs, Decks, Manuals, etc)
ppl.nn
A primitive library for neural network
robin-hood-hashing
Fast & memory efficient hashtable based on robin hood hashing for C++11/14/17/20
text-generation-inference
Large Language Model Text Generation Inference
Torch2TensorRT
PyTorch/TorchScript/FX compiler for NVIDIA GPUs using TensorRT
torchrec
Pytorch domain library for recommendation systems
TransformerEngine
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper GPUs, to provide better performance with lower memory utilization in both training and inference.
transformers
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
tvm
Open deep learning compiler stack for cpu, gpu and specialized accelerators
vllm
A high-throughput and memory-efficient inference and serving engine for LLMs