There are 8 repositories under openacc topic.
This repository consists for gpu bootcamp material for HPC and AI
Abstraction Library for Parallel Kernel Acceleration :llama:
STREAM, for lots of devices written in many programming models
Training materials provided by OpenACC.org.
CLAW Compiler for Performance Portability
N-Ways to GPU Programming Bootcamp
OpenACC* to OpenMP* API assisting migration tool
The sources for the OpenACC Programming and Best Practices Guide.
Case studies constitute a modern interdisciplinary and valuable teaching practice which plays a critical and fundamental role in the development of new skills and the formation of new knowledge. This research studies the behavior and performance of two interdisciplinary and widely adopted scientific kernels, a Fast Fourier Transform and Matrix Multiplication. Both routines are implemented in the two current most popular many-core programming models CUDA and OpenACC. A Fast Fourier Transform (FFT) samples a signal over a period of time and divides it into its frequency components, computing the Discrete Fourier Transform (DFT) of a sequence. Unlike the traditional approach to computing a DFT, FFT algorithms reduce the complexity of the problem from O(n2) to O(nLog2n). Matrix multiplication is a cornerstone routine in Mathematics, Artificial Intelligence and Machine Learning. This research also shows that the nature of the problem plays a crucial role in determining what many-core model will provide the highest benefit in performance.
The repository containing everything you need to compete in the IHPCSS 2019 programming challenge.
jacobi - a benchmark by solving 2D laplace equation with jacobi iterative method. GPU or Xeon Phi can be used.
Matrix multiplication example performed with OpenMP, OpenACC, BLAS, cuBLABS, and CUDA
Materials for "Differences between OpenACC and OpenMP offloading models" tutorial.
Various benchmarks used to inform PSyclone optimisations
OpenMP programming tips for GPU offloading
Interoperability examples for OpenACC.
Kinetic plasma simulation code parallelized with C++ parallel algorithm