Xinfeng's repositories
MPU-ASPLOS-2021
Source code of MPU simulator and compiler for ASPLOS 2021 submission.
cudnn-tuning
Codes for auto-tuning cudnn conv forward implementations
mkldnn-perf
Testing the performance of the MKL-DNN
caffe
Caffe: a fast open framework for deep learning.
caffe-tensorflow
Caffe models in TensorFlow
cublas_perf
Testing the performance of the cuBLAS
cuda-convnet2
Automatically exported from code.google.com/p/cuda-convnet2
fathom
Reference workloads for modern deep learning methods.
FBGEMM
FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/
flash-attention
Fast and memory-efficient exact attention
GD06.github.io
Homepage
gpgpu-sim_distribution
GPGPU-Sim provides a detailed simulation model of contemporary NVIDIA GPUs running CUDA and/or OpenCL workloads. It includes support for features such as TensorCores and CUDA Dynamic Parallelism as well as a performance visualization tool, AerialVisoin, and an integrated energy model, GPUWattch.
Halide
a language for fast, portable data-parallel computation
leveldb
LevelDB is a fast key-value storage library written at Google that provides an ordered mapping from string keys to string values.
models
Models and examples built with TensorFlow
mpu-homepage
Homepage of the MPU project based on the Cayman theme.
NiftyRec
NiftyRec is a software toolbox for Tomographic image reconstruction. NiftyRec is written in C and computationally intensive functions have a GPU accelerated version based on NVidia CUDA. NiftyRec includes a Matlab Toolbox and a Python Package that access the low level routines, hiding the complexity of the GPU accelerated algorithms.
pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
pytorch-cifar
95.16% on CIFAR10 with PyTorch
torchrec
Pytorch domain library for recommendation systems