ecmwf / eckit

A C++ toolkit that supports development of tools and applications at ECMWF.

Home Page:https://confluence.ecmwf.int/display/eckit

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ctest failures when setting multiple MPI_ARGS

DJDavies2 opened this issue · comments

For a particular platform I am running on I need to run mpiexec with -launcher ssh options. To do this I set -DMPI_ARGS="-launcher;ssh" when running cmake. This allows e.g. eckit_test_mpi to succeed but ECKIT-221.sh and ECKIT-166.sh fails with:

    Start 118: ECKIT-166.sh

118/118 Test #118: ECKIT-166.sh ...............................***Failed 0.01 sec

  • cd /var/tmp/frwd-8923939/eckit-build/regressions
  • exe=./ECKIT-166.x
  • '[' 1 == 1 ']'
  • '[' /home/h01/frwd/extraction/lfric-bundle/Scriptify/installs/mpich/bin/mpiexec '!=' MPIEXEC-NOTFOUND ']'
  • /home/h01/frwd/extraction/lfric-bundle/Scriptify/installs/mpich/bin/mpiexec -launcher
    [mpiexec@expspicesrv030] HYDU_set_str (utils/args/args.c:198): cannot assign NULL object
    [mpiexec@expspicesrv030] launcher_fn (ui/mpich/utils.c:797): error setting launcher
    [mpiexec@expspicesrv030] match_arg (utils/args/args.c:156): match handler returned error
    [mpiexec@expspicesrv030] HYDU_parse_array (utils/args/args.c:178): argument matching returned error
    [mpiexec@expspicesrv030] parse_args (ui/mpich/utils.c:1642): error parsing input array
    [mpiexec@expspicesrv030] HYD_uii_mpx_get_parameters (ui/mpich/utils.c:1694): unable to parse user arguments
    [mpiexec@expspicesrv030] main (ui/mpich/mpiexec.c:148): error parsing parameters

On the other hand running cmake with -DMPI_ARGS="-launcher ssh" results in e.g. test_mpi_parallel failing like this:

Start 56: eckit_test_mpi_parallel

56: Test command: /home/h01/frwd/extraction/lfric-bundle/Scriptify/installs/mpich/bin/mpiexec "-launcher ssh" "-n" "4" "/var/tmp/frwd-8923939/eckit-build/tests/mpi/eckit_test_mpi_parallel"
56: Environment variables:
56: OMP_NUM_THREADS=1
56: Test timeout computed to be: 1500
56: [mpiexec@expspicesrv030] match_arg (utils/args/args.c:163): unrecognized argument launcher ssh
56: [mpiexec@expspicesrv030] HYDU_parse_array (utils/args/args.c:178): argument matching returned error
56: [mpiexec@expspicesrv030] parse_args (ui/mpich/utils.c:1642): error parsing input array
56: [mpiexec@expspicesrv030] HYD_uii_mpx_get_parameters (ui/mpich/utils.c:1694): unable to parse user arguments
56: [mpiexec@expspicesrv030] main (ui/mpich/mpiexec.c:148): error parsing parameters
1/1 Test #56: eckit_test_mpi_parallel ..........***Failed 0.03 sec
[mpiexec@expspicesrv030] match_arg (utils/args/args.c:163): unrecognized argument launcher ssh
[mpiexec@expspicesrv030] HYDU_parse_array (utils/args/args.c:178): argument matching returned error
[mpiexec@expspicesrv030] parse_args (ui/mpich/utils.c:1642): error parsing input array
[mpiexec@expspicesrv030] HYD_uii_mpx_get_parameters (ui/mpich/utils.c:1694): unable to parse user arguments
[mpiexec@expspicesrv030] main (ui/mpich/mpiexec.c:148): error parsing parameters

whereas the ECKIT* tests pass.

Thank you for the bug report.
Perhaps the expected behaviour is to use -DMPI_ARGS="-launcher ssh" and not -DMPI_ARGS="-launcher;ssh"
In that case it should be changed in ecbuild. I will get on with that now.

Okay, thanks. It doesn't matter which way is preferred so long as it works for all the tests.

ecbuild develop branch now has this fix. You should now be able to use -DMPI_ARGS="-launcher ssh"
Please close the issue when you verified it works for you.
Thank you for discovering this bug.

Thanks, this works for me.