ChrisGao001's repositories
flash_attention_inference
Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.
lmdeploy
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
LLM_Notes
LLM_Notes
lightllm
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
text-generation-inference
Large Language Model Text Generation Inference
vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
MedicalGPT
MedicalGPT: Training Your Own Medical GPT Model with ChatGPT Training Pipeline. 训练医疗大模型,实现包括二次预训练、有监督微调、奖励建模、强化学习训练。
tvm
Open deep learning compiler stack for cpu, gpu and specialized accelerators
ggml
Tensor library for machine learning
fastllm
纯c++的全平台llm加速库,支持python调用,chatglm-6B级模型单卡可达10000+token / s,支持glm, llama, moss基座,手机端流畅运行
torchrec
Pytorch domain library for recommendation systems
Torch2TensorRT
PyTorch/TorchScript/FX compiler for NVIDIA GPUs using TensorRT
onnxconverter-common
Common utilities for ONNX converters
pdfs
Technically-oriented PDF Collection (Papers, Specs, Decks, Manuals, etc)
ChatGLM-MNN
Pure C++, Easy Deploy ChatGLM-6B.
ChatGLM-6B
ChatGLM-6B:开源双语对话语言模型 | An Open Bilingual Dialogue Language Model
Needle
An imperative deep learning framework with customized GPU and CPU backend
transformers
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
TransformerEngine
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper GPUs, to provide better performance with lower memory utilization in both training and inference.
FlexGen
Running large language models on a single GPU for throughput-oriented scenarios.
ppl.nn
A primitive library for neural network
nann
A flexible, high-performance framework for large-scale retrieval problems based on TensorFlow.
nnfusion
A flexible and efficient deep neural network (DNN) compiler that generates high-performance executable from a DNN model description.
gpt-2
Code for the paper "Language Models are Unsupervised Multitask Learners"
robin-hood-hashing
Fast & memory efficient hashtable based on robin hood hashing for C++11/14/17/20
ncnn
ncnn is a high-performance neural network inference framework optimized for the mobile platform
MNN
MNN is a blazing fast, lightweight deep learning framework, battle-tested by business-critical use cases in Alibaba
graph-learn
An Industrial Graph Neural Network Framework
euler
A distributed graph deep learning framework.