AllanLRH / nbabel

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Benchmark N-Body system

Here are some elapsed times (in s) for 5 implementations (of course, these numbers do not characterized the languages but only particular implementations in some languages).

# particles Py C++ nbabel.org Fortran nbabel.org Julia nbabel.org Rust
1024 30 55 41 45 34
2048 124 231 166 173 137
16384 7220 14640 10914 11100 ?

The implementations in C++, Fortran and Julia come from https://www.nbabel.org/ and have been used in an article published in Nature Astronomy (Zwart, 2020). The results of this updated benchmark was summarized in Augier et al., 2021 (see how to cite the article).

The implementation in Python-Numpy is very simple, but uses Transonic and Pythran (>=0.9.8).

To run these benchmarks, go into the different directories and run make bench1k or make bench2k.

To give an idea of what it gives compared to the figure published in Nature Astronomy:

image

Note: these benchmarks are run sequentially with a Intel(R) Core(TM) i5-8400 CPU @ 2.80GHz.

Note 2: With Numba, the elapsed times are 44 s, 153 s and 11490 s, respectively. This is approximately 20% faster than the C++ implementation.

Note 3: With PyPy, a pure Python implementation (bench_pypy_Point.py) runs for 1024 particles in 133 s, i.e. only 2.4 times slower than the C++ implementation (compared to ~50 times slower as shown in the figure taken from Zwart, 2020). Moreover, with a new version of PyPy (branch map-improvements, merged in default on Feb 02 2021, so one can use a nightly build), another implementation (bench_purepy_Particle.py) runs in 55 s, i.e. same speed than the C++ implementation!

Note 4: The directory "julia" contains some more advanced and faster implementations. The sequential optimized Julia implementation runs on my PC in 22 s, 82 s and 5340 s, respectively (i.e. 25-30% faster than our fast and simple Python implementation).

Note 5: From the high level Numpy implementation (bench_numpy_highlevel.py), if one (i) adds an import from transonic import jit and (ii) decorates the function loop with @jit, the case for 1024 particles runs in 136 s (2.5 times slower than the C++ implementation).

Note 6: The directory "cpp" also contains 2 faster C++ implementations (proposed by @bolverk and @isuruf) which run on my PC in 25 s, 104 s and 7080 s for the AoS implementation by @bolverk and 26s, 104 s and 7330 s for the SoA implementation by @isuruf. Note that there is a problem with the conservation of energy.

Table of codes

See run_benchmarks.py.

Legend figure Source code
C++ nbabel.org main.cpp
Fortran nbabel.org nbabel.f03
Pythran naive bench_numpy_highlevel_jit.py
PyPy bench_purepy_Particle.py
Numba bench_numba.py
Pythran bench.py
Julia naive_lowlevel.jl
Julia optimized nbabel5_serial.jl
Pythran paralllel bench_omp.py
Julia parallel nbabel5_threads.jl

Citation

@article{nbabel2021,
  title = {Reducing the Ecological Impact of Computing through Education and {{Python}} Compilers},
  author = {Augier, Pierre and {Bolz-Tereick}, Carl Friedrich and Guelton, Serge and Mohanan, Ashwin Vishnu},
  year = {2021},
  month = apr,
  volume = {5},
  pages = {334--335},
  publisher = {{Nature Publishing Group}},
  issn = {2397-3366},
  doi = {10.1038/s41550-021-01342-y},
  journal = {Nature Astronomy},
  language = {en},
  number = {4}
}

About

License:GNU General Public License v2.0


Languages

Language:Python 57.4%Language:Julia 21.6%Language:C++ 11.5%Language:Rust 3.7%Language:Makefile 2.9%Language:Fortran 2.2%Language:Shell 0.6%