jundaf2

Source code for the CPU-Free model - a fully autonomous execution model for multi-GPU applications that completely excludes the involvement of the CPU beyond the initial kernel launch.

Language:CudaMIT1500

multi-gpu-programming-models

Examples demonstrating available options to program multiple GPUs in a single node or a cluster

Language:CudaBSD-3-Clause49000

tutorial-multi-gpu

Efficient Distributed GPU Programming for Exascale, an SC/ISC Tutorial

Language:CudaMIT15600

bcl

The Berkeley Container Library

Language:C++BSD-3-Clause11700

gdrcopy

A fast GPU memory copy library based on NVIDIA GPUDirect RDMA technology

Language:C++MIT82100

monolith

ByteDance's Recommendation System

Language:PythonNOASSERTION83800

mlir-air

Language:C++MIT7100

ByteMLPerfWeb

Language:TypeScript700

cuda_scheduling_examiner_mirror

A tool for examining GPU scheduling behavior.

Language:CudaNOASSERTION6300

awesome-courses

:books: List of awesome university courses for learning Computer Science!

5559200

NCCL

Sample examples of how to call collective operation functions on multi-GPU environments. A simple example of using broadcast, reduce, allGather, reduceScatter and sendRecv operations.

2100

open-gpu-kernel-modules

NVIDIA Linux open GPU kernel module source

Language:CNOASSERTION1427700

gpumembench

A GPU benchmark suite for assessing on-chip GPU memory bandwidth

Language:C++GPL-2.09100

detectron2

Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.

Language:PythonApache-2.02945200