Beast code in Giters

DXHPC's repositories

TransformerEngine

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.

Apache-2.0000

triton

Development repository for the Triton language and compiler

MIT000

DecryptPrompt

总结Prompt&LLM论文，开源数据&模型，AIGC应用

000

MaskDiT

Code for Fast Training of Diffusion Models with Masked Transformers

MIT000

onnx

Open standard for machine learning interoperability

Apache-2.0000

brpc

brpc is an Industrial-grade RPC framework using C++ Language, which is often used in high performance system such as Search, Storage, Machine learning, Advertisement, Recommendation etc. "brpc" means "better RPC".

Apache-2.0000

TurboTransformers

a fast and user-friendly runtime for transformer inference (Bert, Albert, GPT2, Decoders, etc) on CPU and GPU.

NOASSERTION000

muduo

Event-driven network library for multi-threaded Linux server in C++11

NOASSERTION000

server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.

BSD-3-Clause000

cutlass

CUDA Templates for Linear Algebra Subroutines

NOASSERTION000

netron

Visualizer for neural network, deep learning, and machine learning models

MIT000

protobuf

Protocol Buffers - Google's data interchange format

NOASSERTION000

FasterTransformer

Transformer related optimization, including BERT, GPT

Apache-2.0000

pouch

An Efficient Enterprise-class Container Engine

Apache-2.0000

blade-build

Blade is a powerful build system from Tencent, supports many mainstream programming languages, such as C/C++, java, scala, python, protobuf...

NOASSERTION000

ByteTransformer

optimized BERT transformer inference on NVIDIA GPU. https://arxiv.org/abs/2210.03052

Apache-2.0000

ColossalAI

Making big AI models cheaper, easier, and more scalable

Apache-2.0000

lightseq

LightSeq: A High Performance Library for Sequence Processing and Generation

NOASSERTION000

BladeDISC

BladeDISC is an end-to-end DynamIc Shape Compiler project for machine learning workloads.

Apache-2.0000

ncnn

ncnn is a high-performance neural network inference framework optimized for the mobile platform

NOASSERTION000

TinyNeuralNetwork

TinyNeuralNetwork is an efficient and easy-to-use deep learning model compression framework.

MIT000

EasyNLP

EasyNLP: A Comprehensive and Easy-to-use NLP Toolkit

Apache-2.0000

TNN

TNN: developed by Tencent Youtu Lab and Guangying Lab, a uniform deep learning inference framework for mobile、desktop and server. TNN is distinguished by several outstanding features, including its cross-platform capability, high performance, model compression and code pruning. Based on ncnn and Rapidnet, TNN further strengthens the support and per

NOASSERTION000

DXHPC

DXHPC's repositories

TransformerEngine

triton

DecryptPrompt

MaskDiT

onnx

brpc

TurboTransformers

muduo

server

cutlass

netron

protobuf

FasterTransformer

pouch

blade-build

ByteTransformer

ColossalAI

lightseq

BladeDISC

ncnn

TinyNeuralNetwork

EasyNLP

TNN

byteps

recommenders-addons

transformer-deploy

nlp_paper_study

onnxruntime

inference

onnx-modifier