Simple OpenACC Fortran Examples
Author: Jeng Bai-Cheng(rjeng@nvidia.com)
An example code is worth a thousand words. This repository intends to host fundamental, but useful examples. Each example is just a few dozen lines of code. Most of them come from my past experience in HPC projects, but readers do not need to have the HPC background to understand the examples.
Eexamples
Basic
- acc_async - faster way to enqueue GPU routines(kernels)
- access_efficiency - faster way to access a GPU arrary
- alternative_nested_parallelism - alternative to nested parallelism on the GPU\
- array_of_pointers - array of pointers usage
- array_setting - faster way to initialize a GPU array
- atomic_op - use atomic operation to maximize parallelism
- auto_depend - automatic dependency solution
- cuda_c_binding - call CUDA C from Fortran
- cuda_graph - faster way to launch GPU kernels
- device_routine - usage of GPU routine. Call other routines in the GPU kernel
- device_variable - usage of GPU variable. Access a global variable from other modules in the GPU kernel
- do_concurrent - DO CONCURRENT construct (Fortran 2008)
- hybrid_omp_acc - usage of OpenMP and OpenACC
MPI
- cuda_mpi_sendrecv - CUDA-Aware MPI, faster way to use MPI on GPU
- cuda_unified_memory_mpi_bcast - usage of CUDA Unified Memory and MPI, more convenient way to use MPI on GPU
- nccl_alltoall - faster Alltoall on GPU
- nccl_alltoallv - faster Alltoallv on GPU
Profiling
- auto_nvtx - use compiler to insert CPU profiling routines automatically
- profiling_range - demonstration of focused profiling via profiling tool
Requirement
- NVIDIA HPC SDK 21.3
To install HPC SDK via Docker, visit NVIDIA GPU Cloud: https://ngc.nvidia.com/catalog/containers/nvidia:nvhpc/tags
Or download HPC SDK from official website: https://developer.nvidia.com/hpc-sdk
Bulid
$ cd <folder>
$ make
Run
$ cd <folder>
$ ./<executable>