JamesTheZ

followers

following

stars

Alibaba Group

https://jamesthez.github.io/

ZHENG, Zhen's repositories

VersaPipe

A framework for pipelined computing on GPU

Language:C++29 5 1

CudaProf

A profiler for CUDA programs based on CUPTI. Similar to NVIDIA Profiler, but simpler.

Language:C4 20

jamesthez.github.io

Website of Zhen Zheng.

Language:JavaScriptMIT200

BladeDISC

BladeDISC is an end-to-end DynamIc Shape Compiler project for machine learning workloads.

Language:C++Apache-2.0100

flash-llm

Language:CudaApache-2.0100

Atom

[MLSys'24] Atom: Low-bit Quantization for Efficient and Accurate LLM Serving

Language:Cuda000

AutoGPTQ

An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.

Language:PythonMIT000

awesome-tensor-compilers

A list of awesome compiler projects and papers for tensor computation and deep learning.

000

cuda_image_filtering_constant

Language:C++GPL-3.0010

cuda_image_filtering_shared

Language:C++GPL-3.0010

DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Language:PythonApache-2.0000

fp6_llm

An efficient GPU support for LLM inference with 6-bit quantization (FP6).

Language:CudaApache-2.0000

peizhishi.github.io

Language:SCSSMIT000

persistVGG

Pure cuda implementation of VGG net

020

shell_script

一键安装 shadowsocks，支持 chacha20-ietf-poly1305 加密方式

Language:Shell010

SyncMicrobenchmark

This work aims at characterizing the synchronization methods in CUDA.

Language:C000

tensorflow

Language:C++Apache-2.0000

recom

Language:C++Apache-2.0000

tensorflow-internals

It is open source ebook about TensorFlow kernel and implementation mechanism.

Language:TeX000

TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.

Language:C++Apache-2.0000

unlock-music

Unlock encrypted music file in browser. 在浏览器中解锁加密的音乐文件。

MIT000

xla_hlo_dump_parse

Language:Python000