minitu

The Co-Design of Exascale Storage Architectures (CODES) simulation framework builds upon the ROSS parallel discrete event simulation engine to provide high-performance simulation utilities and models for building scalable distributed systems simulations

Language:CNOASSERTION000

dlrm

An implementation of a deep learning recommendation model (DLRM)

Language:PythonMIT000

dumpi-cortex

A fork of https://xgitlab.cels.anl.gov/mdorier/dumpi-cortex

Language:C++NOASSERTION010

gerrit2github

Language:Python000

gpu

Contains pieces of GPU related research that are too small to warrant a separate repository.

Language:C000

gpuroofperf-toolkit

A GPU performance prediction toolkit for CUDA programs

Language:CudaMIT000

jacobi2d

Language:Cuda020

kokkos-tutorials

Tutorials for the Kokkos C++ Performance Portability Programming EcoSystem

Language:C++000

Megatron-LM

Ongoing research training transformer models at scale

Language:PythonNOASSERTION000

miniFE

MiniFE Finite Element Mini-Application

Language:C++LGPL-3.0010

miniMD

MiniMD Molecular Dynamics Mini-App

Language:C++000

mpitest

Language:Python010

multi-gpu-programming-models

Examples demonstrating available options to program multiple GPUs in a single node or a cluster

Language:CudaBSD-3-Clause000

NeMo

NeMo: a toolkit for conversational AI

Language:PythonApache-2.0000

ompi

Open MPI main development repository

Language:CNOASSERTION000

pytorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Language:C++NOASSERTION010

sst-dumpi

SST DUMPI Trace Library

Language:CNOASSERTION000

sw4lite

Testing numerical kernels in SW4

Language:CNOASSERTION000

TraceR

Trace Replay and Network Simulation Framework

Language:CMIT000

training

Reference implementations of MLPerf™ training benchmarks

Language:PythonApache-2.0000

TransformerEngine

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.

Language:PythonApache-2.0000

triton

Development repository for the Triton language and compiler

MIT000

minitu

Jaemin Choi's repositories

starter-academic

baseenv

charm

charming

hpm

apex

buggy

changa

codes

dlrm

dumpi-cortex

gerrit2github

gpu

gpuroofperf-toolkit

jacobi2d

kokkos-tutorials

Megatron-LM

miniFE

miniMD

mpitest

multi-gpu-programming-models

NeMo

ompi

pytorch

sst-dumpi

sw4lite

TraceR

training

TransformerEngine

triton