E3SM-Project / transport_se

Atmosphere transport mini app

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

The tracer-transport mini-app (transport_se) is a simplified version of
HOMME designed to facilitate experimentation and performance improvements.

To configure and build the mini-app:

  1) cp configure.sh ~/work/        copy the configure script to a directory of your choosing
  2) vi ~/work/configure.sh         customize the USER-DEFINED PARAMETERS using a text editor
  3) ~/work/configure.sh            execute the script to configure and build the mini-app

To test the mini-app:

  4) cd ~/work/bld/test             navigate to the test directory
  5) make install                   install the namelists and test scripts
  6) vi run_ne8_test.sh             customize the mpirun command, tasks per node, etc...
  7) qsub run_ne8_test.sh           submit a test to the debug queue (or execute it on a laptop)
  8) tail -n 40 ./out_ne8_jobid     examine timing and error norm results
  9) display ~/work/run/image_*     examine plots of tracer cross-sections


Tests:
run_ne8_test.sh:    ultra low-res test for quick debug/check  
run_ne30_test.sh    1 degree test with 4 tracers for verification 
run_ne120_test.sh   1/4 degree test for additonal, optional verification at high-res
                    This test is expensive and often sits in the que for 1 day.

run_ne8_perf.sh
run_ne30_perf.sh
run_ne120_perf.sh   Test case for performance testing
                      DCMIP1-1 for 1 hour of model time.
                      (we might have to increase run time )
                      output disabled.  most diagnostics disabled
                      tracers increased to 35

Recommendations:

   Step 1. Use "run_ne8_test.sh" to get the code up and running

   Step 2. Use "run_ne30_test.sh" to verify the results are correct.  L1, L2
   and Linf errors, overshoot and undershoots should agree to 2-3
   digits.  One should also check that the tracer mass is conserved by
   looking at the "Q, Q diss" values in the stdout.  For additional
   verification, compare the PDF plots of the solution with reference
   plots on the ACME confluence page, "Transport Mini-app Setup and
   Test Results"

   The results should be BFB when run on different numbers of MPI
   tasks and/or openMP threads.  We should add a test for this.

   Step 3. Use "run_ne120_perf.sh" as a starting point for performance testing.  


Note: This mini-app is using DCMIP tests 1-1 and 1-2, modified to use
the ACME 72L configuration instead of the DCMIP equally spaced levels.
The performance characteristics will be very faithful to the HOMME
dycore as used by ACME. However, the ACME 72L configuration has much
larger errors since there are fewer levels where the tracers are
located.  In addition, surface pressure is not treated correctly - for
consistency with prescribed winds and disabled dynamics, we evolve the
surface pressure using the implicit surface pressure from the tracer
advection scheme.  Thus the simulation output of this mini-app should
not be used for DCMIP test case results.



__________________________________________________________________________________________________________________

Typical results for run_ne8_test.sh on NERSC:Hopper

            name     procs  threads  count         walltotal     wallmax (proc   thrd  )   wallmin (proc   thrd  )
  DCMIP1-1: prim_run 48     48       3.456000e+04  1.853381e+03  38.974  (     0      0)   25.267  (     1      0)
  DCMIP1-2: prim_run 48     48       2.880000e+03  9.024055e+01  1.892   (    32      0)   1.660   (     1      0)

  DCMIP1-1: L1=0.564863 L2= 0.527211 Linf= 0.478169 q_max= 0.648135 q_min= -0.0615289
  DCMIP1-2: L1=0.368728 L2= 0.558725 Linf= 1.040230 q_max= 0.991667 q_min= -5.59564e-05

  Used Resources: cput=00:00:07,energy_used=0,mem=12220kb,vmem=123196kb,walltime=00:01:29

Updated 2015-6-27 
With rsplit=1 and limiter=8 (monotone limiter) on Edison
  DCMIP 1-1: L1=0.529721 L2= 0.637361 Linf= 0.616722 q_max= 0.459436 q_min= -2.24628e-08
  DCMIP 1-2: L1=0.386038 L2= 0.556078 Linf= 0.914061 q_max= 0.987406 q_min= -2.64163e-06
On Darwin:
  DCMIP 1-1: L1=0.529721 L2= 0.637360 Linf= 0.616721 q_max= 0.459436 q_min= -2.246253e-08
  DCMIP 1-2: L1=0.386038 L2= 0.556077 Linf= 0.914060 q_max= 0.987405 q_min= -2.642738e-06

Updated 2015-7-14 (new direct addressing limiter)
  DCMIP 1-1: L1=0.529676 L2= 0.637311 Linf= 0.616969 q_max= 0.459701 q_min= -9.78542e-09
  DCMIP 1-2: L1=0.386039 L2= 0.556071 Linf= 0.91406 q_max= 0.987389 q_min= -2.77826e-06

Updated 2015-11-27 (rsplit=3, to match CAM)
  DCMIP 1-1: L1=0.529725 L2=0.637015 Linf=0.610602 q_max=0.458165 q_min= -9.533290e-14
  DCMIP 1-2: L1=0.263420 L2=0.388462 Linf=0.638387 q_max=0.989853 q_min= -1.274771e-08

Updated 2015-11-27 (rsplit=3, ACME 72 level configuration, skybridge)
  DCMIP 1-1: L1=0.578151 L2=0.865526 Linf=0.883168 q_max=0.187204 q_min= -3.207090e-13
  DCMIP 1-2: L1=0.307665 L2=0.622099 Linf=0.839133 q_max=0.813105 q_min= -9.385639e-06




__________________________________________________________________________________________________________________

Typical results for run_ne30_test.sh on NERSC:Hopper

            name      procs  threads  count         walltotal     wallmax (proc   thrd  )   wallmin (proc   thrd  )
  DCMIP1-1  prim_run  216    216      7.464960e+05  8.528053e+04  394.886 (     6      0)   393.686 (     1      0)
  DCMIP1-2  prim_run  216    216      6.220800e+04  5.899456e+03  27.387  (   202      0)    25.604 (     1      0)

  DCMIP1-1: L1=0.1810760 L2= 0.209121 Linf= 0.317952 q_max= 1.014850 q_min= -0.0410253
  DCMIP1-2: L1=0.0799994 L2= 0.150355 Linf= 0.297211 q_max= 0.977932 q_min= -5.7599e-13

  Used Resources: cput=00:00:09,energy_used=0,mem=97628kb,vmem=208712kb,walltime=00:08:41

Updated 2015-6-27
With rsplit limiter=8 (monotone limiter) on Edison
  DCMIP 1-1: L1=0.138421  L2= 0.202415 Linf= 0.298503 q_max= 0.974222 q_min= -7.4405e-14
  DCMIP 1-2: L1=0.0578899 L2= 0.123954 Linf= 0.270832 q_max= 0.979736 q_min= -1.26331e-12

Updated 2015-7-14 (new direct addressing limiter)
  DCMIP 1-1: L1=0.138438  L2= 0.202473 Linf= 0.298522 q_max= 0.974238 q_min= -7.27863e-14
  DCMIP 1-2: L1=0.0578902 L2= 0.123954 Linf= 0.270832 q_max= 0.979733 q_min= -4.54058e-14

Updated 2015-11-27  (rsplit=3, to match CAM)
  DCMIP 1-1: L1=0.138438 L2= 0.202473 Linf= 0.298522 q_max= 0.97423 q_min= -7.633613e-14
  DCMIP 1-2: L1=0.057890 L2= 0.123954 Linf= 0.270831 q_max= 0.97973 q_min= -4.540577e-14

Updated 2015-11-27  (rsplit=3, ACME 72 level config, skybridge)
  DCMIP 1-1: L1=0.490013 L2=0.789052 Linf=0.918454 q_max=0.445141 q_min= -3.559994e-11
  DCMIP 1-2: L1=0.121783 L2=0.361005 Linf=1.092784 q_max=0.836177 q_min= -3.671997e-05



__________________________________________________________________________________________________________________

Typical results for run_ne120_test.sh on NERSC:Hopper

Updated 2015-6-27
With rsplit limiter=8 (monotone limiter) on Edison

  DCMIP1-1 prim_run   960   960   4.423680e+06  2.117038e+06  2205.253 (  709     0)  2205.234 (  259     0)
  DCMIP1-2 prim_run   960   960   3.686400e+05  1.279739e+05   133.365 (  662     0)  133.245  (  443     0)

  DCMIP 1-1: L1=0.0794909 L2= 0.139211  Linf= 0.306975 q_max= 0.984071 q_min= -1.84532e-13
  DCMIP 1-2: L1=0.0363203 L2= 0.0987049 Linf= 0.277493 q_max= 0.994146 q_min= -5.89671e-14

Updated 2015-7-14 (new direct addressing limiter)
  DCMIP 1-1: L1=0.0794911 L2= 0.139212  Linf= 0.30697  q_max= 0.984063 q_min= -1.76014e-13
  DCMIP 1-2: L1=0.0363203 L2= 0.0987048 Linf= 0.277493 q_max= 0.994146 q_min= -5.06671e-14

Updated 2015-11-27 (rsplit=3, ACME 72 level config)
  (something is wrong with the DCMIP 1-1 configuration on 72L, but it should still be ok for performace )
  DCMIP 1-1: L1=0.479398 L2=0.782613 Linf=0.922696 q_max=0.501561 q_min= -1.070570e-09
  DCMIP 1-2: L1=0.081287 L2=0.264887 Linf=0.591157 q_max=0.959530 q_min= -2.795861e-09




__________________________________________________________________________________________________________________

Typical results for run_ne120_perf.sh on Edison:

A x B x C, with
A = nodes 
B = mpitasks_per_node 
C = threads_per_mpitask

40 node cases.  2160 elements per node.  1h simulation time

                                     2015-6-27            2015-11-29
                                     50 tracers            35 tracers
                                     64L                   72L
                                     rsplit=1              rsplit=3
40x24x1
prim_run                                                   42.643
prim_advec_tracers_remap_rk2         53.343                37.195
vertical_remap                        1.800                 1.527

40x12x2   
prim_run                                                   43.306  
prim_advec_tracers_remap_rk2         56.085                37.826       
vertical_remap                        1.801                 1.518  
                                              

40x6x4                               
prim_run                                                   44.176
prim_advec_tracers_remap_rk2         63.195                38.644  
vertical_remap                        1.769                 1.473

40x2x12                               
prim_run                                                    46.040
prim_advec_tracers_remap_rk2         88.949                 40.390
vertical_remap                        1.765                  1.492






About

Atmosphere transport mini app


Languages

Language:Fortran 85.8%Language:C 5.9%Language:CMake 3.1%Language:Shell 2.2%Language:Makefile 2.1%Language:NCL 0.4%Language:Perl 0.2%Language:Pascal 0.1%Language:NewLisp 0.1%Language:Awk 0.0%Language:M4 0.0%