Mengchi Zhang's repositories
AITemplate
AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.
benchmark
TorchBench is a collection of open source benchmarks used to evaluate PyTorch performance.
CHIPKIT
CHIPKIT: An agile, reusable open-source framework for rapid test chip development
ck-artifact-evaluation
Public CK repository with materials and workflows to reproduce results from published papers or open competitions at ACM, IEEE and NeurIPS conferences and journals
cub
THIS REPOSITORY HAS MOVED TO github.com/nvidia/cub, WHICH IS AUTOMATICALLY MIRRORED HERE.
cuda-samples
Samples for CUDA Developers which demonstrates features in CUDA Toolkit
gpgpu-sim_distribution
GPGPU-Sim provides a detailed simulation model of a contemporary GPU running CUDA and/or OpenCL workloads and now includes an integrated (and validated) energy model, GPUWattch.
fairscale
PyTorch extensions for high performance and large scale training.
FBGEMM
FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/
Galois
Galois: C++ library for multi-core and multi-node parallelization
gpgpu-sim_simulations
A repository that compliments gpgpu-sim, providing automated regression scripts, simulation launching utilities and the code + arguments for simulations that complete in a reasonable amount of time on GPGPU-Sim.
gpufs
GPUfs - File system support for NVIDIA GPUs
ISCA-2021-Script
A collection of redistributable Python scripts to help organize ISCA 2021 (The 48th International Symposium on Computer Architecture).
llvm-pass-skeleton
example LLVM pass
llvm-project
The LLVM Project is a collection of modular and reusable compiler and toolchain technologies. Note: the repository does not accept github pull requests at this moment. Please submit your patches at http://reviews.llvm.org.
micrograd
A tiny scalar-valued autograd engine and a neural net library on top of it with PyTorch-like API
MightyPC
Mighty toolkit for conference Program Chairs.
pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
sst-gpgpusim
SST GPGPU Simulation Components
thrust
The C++ parallel algorithms library.
tinygrad
You like pytorch? You like micrograd? You love tinygrad! ❤️
torchrec
Pytorch domain library for recommendation systems
triton
Development repository for the Triton language and compiler