ZZK's repositories
CacheLib
Pluggable in-process caching engine to build and scale high performance services
FBGEMM
FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/
DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
YHs_Sample
Yinghan's Code Sample
FBTT-Embedding
This is a Tensor Train based compression library to compress sparse embedding tables used in large-scale machine learning models such as recommendation and natural language processing. We showed this library can reduce the total model size by up to 100x in Facebook’s open sourced DLRM model while achieving same model quality. Our implementation is faster than the state-of-the-art implementations. Existing the state-of-the-art library also decompresses the whole embedding tables on the fly therefore they do not provide memory reduction during runtime of the training. Our library decompresses only the requested rows therefore can provide 10,000 times memory footprint reduction per embedding table. The library also includes a software cache to store a portion of the entries in the table in decompressed format for faster lookup and process.
data
A PyTorch repo for data loading and utilities to be shared by the PyTorch domain libraries.
openmlsys-cuda
Tutorials for writing high-performance GPU operators in AI frameworks.
Cpp-Concurrency-in-Action-2ed
C++11/14/17/20 multithreading, involving operating system principles and concurrent programming technology.
powersgd
Practical low-rank gradient compression for distributed optimization: https://arxiv.org/abs/1905.13727
DeepRec
DeepRec is a recommendation engine based on TensorFlow.
cuda-training-series
Training materials associated with NVIDIA's CUDA Training Series (www.olcf.ornl.gov/cuda-training-series/)
NVIDIA_SGEMM_PRACTICE
Step-by-step optimization of CUDA SGEMM
tensorflow-internals
It is open source ebook about TensorFlow kernel and implementation mechanism.
AI-System
System for AI Education Resource.
cuda-samples
Samples for CUDA Developers which demonstrates features in CUDA Toolkit
pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
DesignPattern
C++11全套设计模式-23种指针的用法(a full DesignPattern implement with c++11)