butterluo's repositories
autogen
A programming framework for agentic AI. Discord: https://aka.ms/autogen-dc. Roadmap: https://aka.ms/autogen-roadmap
ByteTransformer
optimized BERT transformer inference on NVIDIA GPU. https://arxiv.org/abs/2210.03052
CoFiPruning
ACL'22: Structured Pruning Learns Compact and Accurate Models https://arxiv.org/abs/2204.00408
cuda-samples
Samples for CUDA Developers which demonstrates features in CUDA Toolkit
FasterTransformer
Transformer related optimization, including BERT, GPT
flash-attention
Fast and memory-efficient exact attention
gpgpu-sim_distribution
GPGPU-Sim provides a detailed simulation model of contemporary NVIDIA GPUs running CUDA and/or OpenCL workloads. It includes support for features such as TensorCores and CUDA Dynamic Parallelism as well as a performance visualization tool, AerialVisoin, and an integrated energy model, GPUWattch.
hfai-models
HFAI deep learning models
How_to_optimize_in_GPU
This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, sgemv, sgemm, etc. The performance of these kernels is basically at or near the theoretical limit.
kubernetes
Production-Grade Container Scheduling and Management
nccl-fastsocket
NCCL Fast Socket is a transport layer plugin to improve NCCL collective communication performance on Google Cloud.
Open-Assistant
OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so.
tensorflow
An Open Source Machine Learning Framework for Everyone
TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
Transformers4Rec
Transformers4Rec is a flexible and efficient library for sequential and session-based recommendation and works with PyTorch.
YHs_Sample
Yinghan's Code Sample