LiuWei's repositories
CrossStackProfiler
A CrossStackProfiler for PaddlePaddle to Train SuperErnie
cutlass
CUDA Templates for Linear Algebra Subroutines
flash-attention
Fast and memory-efficient exact attention
horovod
Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.
incubator-brpc
Industrial-grade RPC framework used throughout Baidu, with 1,000,000+ instances and thousands kinds of services. "brpc" means "better RPC".
leetcode-master
《代码随想录》LeetCode 刷题攻略:200道经典题目刷题顺序,共60w字的详细图解,视频难点剖析,50余张思维导图,支持C++,Java,Python,Go,JavaScript等多语言版本,从此算法学习不再迷茫!🔥🔥 来看看,你会发现相见恨晚!🚀
nccl
Optimized primitives for collective multi-GPU communication
package
some common package
Paddle
PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)
Swin-Transformer
This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows".
vllm
A high-throughput and memory-efficient inference and serving engine for LLMs