DXHPC's repositories
TransformerEngine
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.
triton
Development repository for the Triton language and compiler
DecryptPrompt
总结Prompt&LLM论文,开源数据&模型,AIGC应用
MaskDiT
Code for Fast Training of Diffusion Models with Masked Transformers
onnx
Open standard for machine learning interoperability
brpc
brpc is an Industrial-grade RPC framework using C++ Language, which is often used in high performance system such as Search, Storage, Machine learning, Advertisement, Recommendation etc. "brpc" means "better RPC".
TurboTransformers
a fast and user-friendly runtime for transformer inference (Bert, Albert, GPT2, Decoders, etc) on CPU and GPU.
muduo
Event-driven network library for multi-threaded Linux server in C++11
server
The Triton Inference Server provides an optimized cloud and edge inferencing solution.
cutlass
CUDA Templates for Linear Algebra Subroutines
netron
Visualizer for neural network, deep learning, and machine learning models
protobuf
Protocol Buffers - Google's data interchange format
FasterTransformer
Transformer related optimization, including BERT, GPT
pouch
An Efficient Enterprise-class Container Engine
blade-build
Blade is a powerful build system from Tencent, supports many mainstream programming languages, such as C/C++, java, scala, python, protobuf...
ByteTransformer
optimized BERT transformer inference on NVIDIA GPU. https://arxiv.org/abs/2210.03052
ColossalAI
Making big AI models cheaper, easier, and more scalable
lightseq
LightSeq: A High Performance Library for Sequence Processing and Generation
BladeDISC
BladeDISC is an end-to-end DynamIc Shape Compiler project for machine learning workloads.
ncnn
ncnn is a high-performance neural network inference framework optimized for the mobile platform
TinyNeuralNetwork
TinyNeuralNetwork is an efficient and easy-to-use deep learning model compression framework.
EasyNLP
EasyNLP: A Comprehensive and Easy-to-use NLP Toolkit
TNN
TNN: developed by Tencent Youtu Lab and Guangying Lab, a uniform deep learning inference framework for mobile、desktop and server. TNN is distinguished by several outstanding features, including its cross-platform capability, high performance, model compression and code pruning. Based on ncnn and Rapidnet, TNN further strengthens the support and per
byteps
A high performance and generic framework for distributed DNN training
recommenders-addons
Additional utils and helpers to extend TensorFlow when build recommendation systems, contributed and maintained by SIG Recommenders.
transformer-deploy
Efficient, scalable and enterprise-grade CPU/GPU inference server for 🤗 Hugging Face transformer models 🚀
nlp_paper_study
该仓库主要记录 NLP 算法工程师相关的顶会论文研读笔记
onnxruntime
ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
inference
Reference implementations of MLPerf™ inference benchmarks
onnx-modifier
A tool to modify onnx models in a visualization fashion, based on Netron and flask.