Jaemin Choi's repositories
apex
A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
changa
Mirror of UIUC/PPL version of ChaNGa
codes
The Co-Design of Exascale Storage Architectures (CODES) simulation framework builds upon the ROSS parallel discrete event simulation engine to provide high-performance simulation utilities and models for building scalable distributed systems simulations
dlrm
An implementation of a deep learning recommendation model (DLRM)
dumpi-cortex
A fork of https://xgitlab.cels.anl.gov/mdorier/dumpi-cortex
gpu
Contains pieces of GPU related research that are too small to warrant a separate repository.
gpuroofperf-toolkit
A GPU performance prediction toolkit for CUDA programs
kokkos-tutorials
Tutorials for the Kokkos C++ Performance Portability Programming EcoSystem
Megatron-LM
Ongoing research training transformer models at scale
miniMD
MiniMD Molecular Dynamics Mini-App
multi-gpu-programming-models
Examples demonstrating available options to program multiple GPUs in a single node or a cluster
NeMo
NeMo: a toolkit for conversational AI
ompi
Open MPI main development repository
sst-dumpi
SST DUMPI Trace Library
sw4lite
Testing numerical kernels in SW4
TraceR
Trace Replay and Network Simulation Framework
training
Reference implementations of MLPerf™ training benchmarks
TransformerEngine
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.
triton
Development repository for the Triton language and compiler