Pluggable in-process caching engine to build and scale high performance services
FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Yinghan's Code Sample
This is a Tensor Train based compression library to compress sparse embedding tables used in large-scale machine learning models such as recommendation and natural language processing. We showed this library can reduce the total model size by up to 100x in Facebook’s open sourced DLRM model while achieving same model quality. Our implementation is faster than the state-of-the-art implementations. Existing the state-of-the-art library also decompresses the whole embedding tables on the fly therefore they do not provide memory reduction during runtime of the training. Our library decompresses only the requested rows therefore can provide 10,000 times memory footprint reduction per embedding table. The library also includes a software cache to store a portion of the entries in the table in decompressed format for faster lookup and process.
A PyTorch repo for data loading and utilities to be shared by the PyTorch domain libraries.
Tutorials for writing high-performance GPU operators in AI frameworks.
C++11/14/17/20 multithreading, involving operating system principles and concurrent programming technology.
Practical low-rank gradient compression for distributed optimization: https://arxiv.org/abs/1905.13727
DeepRec is a recommendation engine based on TensorFlow.
Training materials associated with NVIDIA's CUDA Training Series (www.olcf.ornl.gov/cuda-training-series/)
Step-by-step optimization of CUDA SGEMM
It is open source ebook about TensorFlow kernel and implementation mechanism.
System for AI Education Resource.
Samples for CUDA Developers which demonstrates features in CUDA Toolkit
Tensors and Dynamic neural networks in Python with strong GPU acceleration
C++11全套设计模式-23种指针的用法(a full DesignPattern implement with c++11)