geometry |
---|
margin=2cm |
High Performance Computing Problem Set 6
Usage
- To compile
make nbody
- To run
mpirun -n $n_ranks ./nbody $n_bodies $n_iters $n_threads [$savefile]
python plotter.py
will renderoutfile.bin
intosimulation.mp4
make show
will run and plot based on the setting at the top of the makefile
Parallel vs Serial
Compiling with -DIDENTICAL
makes the 0 rank initialize the bodies for every
other rank, otherwise each rank will initialize their own bodies. This should
only make a small difference as initialization is make test
(be sure to set the parameters at the top of
the file).
Midway Scaling
This is a weird result. Somehow using hybrid OMP and MPI was slower as I increased nodes. This may have to do with how Midway's memory is laid out. If each cpu on midway has separate RAM then the hybrid parallelism should be worse, though that won't explain why it slows down with more Nodes.
Nodes Hybrid MPI-Only
1 101.16 102.13 2 105.36 53.20 4 111.63 29.40 8 139.94 20.23
Hybrid vs Pure MPI Vesta
Like in Midway, hybrid parallelism is dramatically slower on my implementation. I'm not sure why.
Vesta
Scaling tests were run with
Nodes Seconds bodies/rank
32 360.47 256
64 183.29 128
128 98.26 64 256 58.84 32 512 48.08 16 1024 55.42 8
Production simulation
It turned out that my simulation took over an hour to run so it stopped at iteration 395. This breaks the plotter file and you have to hardcode the number of time steps to render the final simulation.
https://github.com/CasperN/hpc_ps6/blob/master/final_simulation.mp4