ZHENG, Zhen's repositories
jamesthez.github.io
Website of Zhen Zheng.
Atom
[MLSys'24] Atom: Low-bit Quantization for Efficient and Accurate LLM Serving
AutoGPTQ
An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
awesome-tensor-compilers
A list of awesome compiler projects and papers for tensor computation and deep learning.
DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
fp6_llm
An efficient GPU support for LLM inference with 6-bit quantization (FP6).
persistVGG
Pure cuda implementation of VGG net
shell_script
一键安装 shadowsocks,支持 chacha20-ietf-poly1305 加密方式
SyncMicrobenchmark
This work aims at characterizing the synchronization methods in CUDA.
tensorflow-internals
It is open source ebook about TensorFlow kernel and implementation mechanism.
TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
unlock-music
Unlock encrypted music file in browser. 在浏览器中解锁加密的音乐文件。