ZSL98's repositories
x-transformers
A simple but complete full-attention transformer with a set of promising experimental features from various papers
ccf-deadlines
⏰ Collaboratively track deadlines of conferences recommended by CCF (Website, Python Cli, Wechat Applet) / If you find it useful, please star this project, thanks~
CuAssembler
An unofficial cuda assembler, for all generations of SASS, hopefully :)
cuda_hook
Hooked CUDA-related dynamic libraries by using automated code generation tools.
DeepLearningExamples
Deep Learning Examples
FedML
A Research-oriented Federated Learning Library. Supporting distributed computing, mobile/IoT on-device training, and standalone simulation. A short version of our white paper has been accepted by NeurIPS 2020 workshop.
FedML-Server
FedML-Server: Federated Learning Server for FedML-IoT and FedML-Mobile
gdev
First-Class GPU Resource Management: Device Drivers, Runtimes, and CUDA Compilers for Nouveau.
Megatron-LM
Ongoing research training transformer models at scale
nnfusion
A flexible and efficient deep neural network (DNN) compiler that generates high-performance executable from a DNN model description.
nvsci
Linux kernel modules for secure sharing of memory buffers
orion
An interference-aware scheduler for fine-grained GPU sharing
Shallow-Deep-Networks
Source Code for ICML 2019 Paper "Shallow-Deep Networks: Understanding and Mitigating Network Overthinking"
Swin-Transformer
This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows".
Syte2
Syte2 is a personal website with interactive social integrations.
TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
TGS
Artifacts for our NSDI'23 paper TGS
tSparse
A GPU algorithm for sparse matrix-matrix multiplication
tvm
Open deep learning compiler stack for cpu, gpu and specialized accelerators
zsl98.github.io
Shulai Zhang's Homepage