Shintaro Iwasaki's repositories
apex
A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
argobots
Copy of Argobots Repository
FBTT-Embedding
This is a Tensor Train based compression library to compress sparse embedding tables used in large-scale machine learning models such as recommendation and natural language processing. We showed this library can reduce the total model size by up to 100x in Facebook’s open sourced DLRM model while achieving same model quality. Our implementation is faster than the state-of-the-art implementations. Existing the state-of-the-art library also decompresses the whole embedding tables on the fly therefore they do not provide memory reduction during runtime of the training. Our library decompresses only the requested rows therefore can provide 10,000 times memory footprint reduction per embedding table. The library also includes a software cache to store a portion of the entries in the table in decompressed format for faster lookup and process.
folly
An open-source C++ library developed and used at Facebook.
jekyll-action
A GitHub Action to publish Jekyll based content as a GitHub Pages site
kineto
A CPU+GPU Profiling library that provides access to timeline traces and hardware performance counters.
llvm-project
The LLVM Project is a collection of modular and reusable compiler and toolchain technologies. Note: the repository does not accept github pull requests at this moment. Please submit your patches at http://reviews.llvm.org.
mpich
Official MPICH Repository
ompi
Open MPI main development repository
optimizers
For optimization algorithm research and development.
p2s2-www
International Workshop on Parallel Programming Models and Systems Software for High-End Computing Website
ppopp21-preemption-artifact
Artifact of the paper "Lightweight Preemptive User-Level Threads" in PPoPP'21
pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
rccl-tests
RCCL Performance Benchmark Tests
spack
A flexible package manager that supports multiple versions, configurations, platforms, and compilers.
triton-shared
Shared Middle-Layer for Triton Compilation
yaksa-www
Yaksa: High-performance Noncontiguous Data Management