ZHENG, Zhen (JamesTheZ)

JamesTheZ

Geek Repo

Company:Alibaba Group

Home Page:https://jamesthez.github.io/

Github PK Tool:Github PK Tool

ZHENG, Zhen's repositories

VersaPipe

A framework for pipelined computing on GPU

CudaProf

A profiler for CUDA programs based on CUPTI. Similar to NVIDIA Profiler, but simpler.

Language:CStargazers:4Issues:2Issues:0

jamesthez.github.io

Website of Zhen Zheng.

Language:JavaScriptLicense:MITStargazers:2Issues:0Issues:0

BladeDISC

BladeDISC is an end-to-end DynamIc Shape Compiler project for machine learning workloads.

Language:C++License:Apache-2.0Stargazers:1Issues:0Issues:0
Language:CudaLicense:Apache-2.0Stargazers:1Issues:0Issues:0

Atom

[MLSys'24] Atom: Low-bit Quantization for Efficient and Accurate LLM Serving

Language:CudaStargazers:0Issues:0Issues:0

AutoGPTQ

An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

awesome-tensor-compilers

A list of awesome compiler projects and papers for tensor computation and deep learning.

Stargazers:0Issues:0Issues:0
Language:C++License:GPL-3.0Stargazers:0Issues:1Issues:0
Language:C++License:GPL-3.0Stargazers:0Issues:1Issues:0

DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

fp6_llm

An efficient GPU support for LLM inference with 6-bit quantization (FP6).

Language:CudaLicense:Apache-2.0Stargazers:0Issues:0Issues:0
Language:SCSSLicense:MITStargazers:0Issues:0Issues:0

persistVGG

Pure cuda implementation of VGG net

Stargazers:0Issues:2Issues:0

shell_script

一键安装 shadowsocks,支持 chacha20-ietf-poly1305 加密方式

Language:ShellStargazers:0Issues:1Issues:0

SyncMicrobenchmark

This work aims at characterizing the synchronization methods in CUDA.

Language:CStargazers:0Issues:0Issues:0
Language:C++License:Apache-2.0Stargazers:0Issues:0Issues:0
Language:C++License:Apache-2.0Stargazers:0Issues:0Issues:0

tensorflow-internals

It is open source ebook about TensorFlow kernel and implementation mechanism.

Language:TeXStargazers:0Issues:0Issues:0

TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.

Language:C++License:Apache-2.0Stargazers:0Issues:0Issues:0

unlock-music

Unlock encrypted music file in browser. 在浏览器中解锁加密的音乐文件。

License:MITStargazers:0Issues:0Issues:0
Language:PythonStargazers:0Issues:0Issues:0