rafaelfsilva / bb-workflow

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Cybershake Workflow (NERSC Cori with Burst Buffer)

For the purposes of exploring burst buffer performance and impact, we have developed a workflow that focuses primarily on the two CyberShake jobs which together account for 97% of the compute time: the wave propagation code AWP-ODC-SGT, and the post-processing code DirectSynth which synthesizes seismograms and produces intensity measures.

AWP-ODC-SGT

The AWP-ODC-SGT code is a modified version of AWP-ODC, an anelastic wave propagation MPI CPU code developed within the SCEC community which has demonstrated excellent scalability at large core counts. It takes as input a velocity mesh of about 10 billion points, as well as some small parameter files. For this workflow, we selected a representative simulation, which requires about an hour on 313 Cori nodes, and produces ~275 GB of output. Two of these simulations, one for each horizontal component, must be run in order to produce the pair of SGTs needed for CyberShake post-processing (i.e., fx.sgt and fy.sgt, a total of ~550 GB) for a single site.

DirectSynth

The DirectSynth code is an MPI code which performs seismic reciprocity calculations. It takes as input a list of fault ruptures and the SGTs generated by AWP-ODC-SGT. From each rupture 10-600 individual earthquakes which vary in slip and hypocenter location are created, and the slip time history for each earthquake is convolved with the SGTs to produce a two-component seismogram. DirectSynth code follows the master-worker paradigm, in which a task manager reads in the list of ruptures, creates a queue of seismogram synthesis tasks, and then communicates the tasks to the workers via MPI. Processes within the DirectSynth job, the SGT handlers, each read in part of the SGT files, accounting for the majority of data read. Worker processes request and receive the SGTs needed for the convolution from the SGT handlers over MPI. Output data is forwarded to an aggregator, which in total writes 4 files per rupture totaling about 4 MB. For this paper, we selected a CyberShake site with about 5,700 ruptures, resulting in about 23,000 files totaling about 22 GB. Running on 64 Cori nodes, this job takes about 8 hours to complete and produces the outputs CyberShake requires for a single site.

About


Languages

Language:Python 81.0%Language:Shell 19.0%