gitover22

🎉 CUDA Learn Notes with PyTorch: fp32、fp16/bf16、fp8/int8、flash_attn、sgemm、sgemv、warp/block reduce、dot prod、elementwise、softmax、layernorm、rmsnorm、hist etc.

GPL-3.0200

DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Apache-2.0200

glake

GLake: optimizing GPU memory management and IO transmission.

Apache-2.0200

lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

Apache-2.0200

nnfusion

A flexible and efficient deep neural network (DNN) compiler that generates high-performance executable from a DNN model description.

MIT200

sglang

SGLang is a fast serving framework for large language models and vision language models.

Apache-2.0200

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.

Apache-2.0200

vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Apache-2.0200

cambricon-pytorch

Build Cambricon PyTorch from source

100

gitover22

huafeng's repositories

CaptchaNetTrainer

gitover22

the-Congestion-Control-Process-in-TCP-in-NS-3

DSF_CNCL

HF_DFS

SuperServer

Go_notes

Linux_syscall_demo

miniCache

MiniGPT4-on-MLU

ChatStoreHub

cuda-samples

LLaMA-infer

llama-study

CUDA-Learn-Notes

DeepSpeed

glake

lmdeploy

nnfusion

sglang

sinfer-project

TensorRT-LLM

vllm

cambricon-pytorch