The smoothed particle hydrodynamics (SPH) technique is a purely Lagrangian method. SPH discretizes a fluid in a series of interpolation points (SPH particles) whose distribution follows the mass density of the fluid and their evolution relies on a weighted interpolation over close neighboring particles.
SPH simulations represent computationally demanding calculations. Therefore, trade-offs are made between temporal and spatial scales, resolution, dimensionality (3-D or 2-D), and approximated versions of the physics involved. The parallelization of SPH codes is not trivial due to their boundless nature and the absence of a structured grid. SPHYNX, ChaNGa, and SPH-flow are the three SPH codes selected in the PASC SPH-EXA project to act as parent and reference codes to SPH-EXA. The performance of these three codes is negatively impacted by factors such as imbalanced multi-scale physics, individual time-stepping, halos exchange, and long-range forces. Therefore, the goal is to extrapolate their common basic SPH features, and consolidate them in a fully optimized, Exascale-ready, MPI+X, SPH code: SPH-EXA.
SPH-EXA is a C++17 simulation code, parallelized with MPI, OpenMP, CUDA and HIP.
Check our wiki for more details
SPH-EXA
├── README.md
├── docs
├── domain - Cornerstone library: octree building and domain decomposition
│ ├── include
│ │ └── cstone
│ │ ├── CMakeLists.txt
│ │ ├── cuda
│ │ ├── domain
│ │ ├── findneighbors.hpp
│ │ ├── halos
│ │ ├── primitives
│ │ ├── sfc
│ │ ├── tree
│ │ └── util
│ └── test - Cornerstone unit- performance-
│ ├── integration_mpi and integration tests
│ ├── performance
│ ├── unit
│ └── unit_cuda
├── ryoanji - Ryoanji: N-body solver for gravity
│ ├─── src
│ └─── test - demonstrator app and unit tests
│
├── sph - SPH implementation
│ ├─── include
│ │ └── sph
│ └─── test - SPH kernel unit tests
│
└── src
├── init - initial conditions for test cases
├── io - file output functionality
└── sphexa - SPH main application front-end
Use the following commands to compile the main SPH-EXA application:
Minimal CMake configuration:
mkdir build
cd build
cmake <GIT_SOURCE_DIR>
Recommended CMake configuration on Piz Daint, using the (default) Cray Clang compiler for CPU code (.cpp) and nvcc/g++ for GPU code (.cu):
module load daint-gpu
module load cudatoolkit/11.2.0_3.39-2.1__gf93aa1c # or newer
module load CMake/3.22.1 # or newer
module load cray-hdf5-parallel
module load gcc/9.3.0 # nvcc uses gcc as the default host compiler,
# but the system version is too old
export GCC_X86_64=/opt/gcc/9.3.0/snos # system header versions are too old, applies to cray-clang too
mkdir build
cd build
cmake -DCMAKE_CXX_COMPILER=CC <GIT_SOURCE_DIR>
- Build everything:
make -j
The main sphexa
application can either start a simulation by reading initial conditions
from a file or generate an initial configuration for a named test case.
Self-gravity will be activated automatically based on named test-case choice or if the HDF5 initial
configuration file has an HDF5 attribute with a non-zero value for the gravitational constant.
Arguments:
--init CASE/FILE
:sedov
for simulation the Sedov blast wave,noh
for the Noh implosion,evrard
for the Evrard collapse or provide an HDF5 file with valid input data-n NUM
: Run the simulation with NUM^3 (NUM to the cube) number of particles (for named test cases)-s NUM
: Run the simulation with NUM of iterations (time-steps)-w NUM
: Dump particle data every NUM iterations (time-steps)-f FIELDS
: Comma separated list of particle fields for file output dumps--quiet
: Don't print any output to stdout
Example usage:
OMP_NUM_THREADS=4 ./sphexa --init sedov -n 100 -s 1000 -w 10 -f x,y,z,rho,p
Runs Sedov with 100^3 particles for 1000 iterations (time-steps) with 4 OpenMP threads and dumps particle xyz-coordinates, density and pressure data every 10 iterationsOMP_NUM_THREADS=4 ./sphexa-cuda --init -n 100 -s 1000 -w 10 -f x,y,z,rho,p
Runs Sedov with 100^3 million particles for 1000 iterations (time-steps) with 4 OpenMP threads. Uses the GPU for most of the compute work.OMP_NUM_THREADS=4 mpiexec -np 2 ./sphexa --init noh -n 100 -s 1000 -w 10
Runs Noh with 100^3 million particles for 1000 iterations (time-steps) with 2 MPI ranks of 4 OpenMP threads each. Works when using MPICH. For OpenMPI, usempirun
instead.OMP_NUM_THREADS=12 srun -Cgpu -A<your account> -n<nnodes> -c12 ./sphexa-cuda --init sedov -n 100 -s 1000 -w 10
Optimal runtime configuration on Piz Daint fornnodes
GPU compute nodes. Launches 1 MPI rank with 12 OpenMP threads per node../sphexa-cuda --init evrard.h5 -s 2000 -w 100 -f x,y,z,rho,p,vx,vy,vz
Run SPH-EXA, initializing particle data from an input file (e.g. for the Evrard collapse). Includes gravitational forces between particles. The angle dependent accuracy parameter theta can be specificed with--theta <value>
, the default is0.5
.
Cornerstone octree and domain unit tests:
./domain/test/unit/component_units
GPU-enabled unit tests:
./domain/test/unit_cuda/component_units_cuda
MPI-enabled integration and regression tests:
mpiexec -np 2 ./domain/test/integration_mpi/domain_2ranks
mpiexec -np 2 ./domain/test/integration_mpi/exchange_focus
mpiexec -np 2 ./domain/test/integration_mpi/exchange_halos
mpiexec -np 2 ./domain/test/integration_mpi/globaloctree
mpiexec -np 5 ./domain/test/integration_mpi/domain_nranks
mpiexec -np 5 ./domain/test/integration_mpi/exchange_domain
mpiexec -np 5 ./domain/test/integration_mpi/exchange_keys
mpiexec -np 5 ./domain/test/integration_mpi/focus_tree
mpiexec -np 5 ./domain/test/integration_mpi/treedomain
SPH-kernel unit tests:
./include/sph/test/kernel/kernel_tests
Some tests require input data. For example, the Evrard test case will check that a Test3DEvrardRel.bin file exists and can be read at the beginning of the job. This file can be downloaded from zenodo.org.
Ryoanji is a high-performance GPU N-body solver for gravity. It relies on the cornerstone octree framework for tree construction, EXAFMM multipole kernels, and a warp-aware tree-traversal inspired by the Bonsai GPU tree-code.
./test/ryoanji <log2(numParticles)> <computeDirectReference:yes=1,no=0>
./test/ryoanji_unit_tests
- Ruben Cabezon
- Aurelien Cavelan
- Florina Ciorba
- Michal Grabarczyk
- Danilo Guerrera
- David Imbert
- Sebastian Keller
- Lucio Mayer
- Ali Mohammed
- Jg Piccinali
- Tom Quinn
- Darren Reed
This project is licensed under the MIT License - see the LICENSE file for details