abinit / abinit

The official github mirror of the Abinit repository. We welcome bug fixes and improvements. Note that most of the active developments are hosted on our https://gitlab.abinit.org/ server. Before embarking on making significant changes, please contact the Abinit group.

Home Page:https://www.abinit.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

9.2.2 testsuite failures on MIPS architectures

mbanck opened this issue · comments

The 9.2.2 has introduced a testsuite regression on MIPS (32bit or 64bit little endian) architectures, see https://buildd.debian.org/status/package.php?p=abinit

In version 8.10.3 the testsuites still passed, see https://buildd.debian.org/status/logs.php?pkg=abinit&arch=mips64el and https://buildd.debian.org/status/fetch.php?pkg=abinit&arch=mips64el&ver=8.10.3-3&stamp=1601210217&raw=0

Debian runs runtests.py fast as test-suite, the fldiff of e.g. t24 is this:

# YAML support is available, but is disabled for this test.
# Start legacy fldiff comparison report
2
< .Version 9.0.0 of ABINIT
> .Version 9.2.2 of ABINIT
3
< .(MPI version, prepared for a x86_64_linux_gnu9.2 computer)
> .(MPI version, prepared for a mipsel_linux_gnu10.2 computer)
17
< .Starting date : Mon 24 Feb 2020.
> .Starting date : Mon 22 Feb 2021.
208
<  ETOT  1  -32.145013381823    -3.215E+01 2.148E-01 5.370E+02
>  ETOT  1  -32.033288641759    -3.203E+01 2.219E-01 5.278E+02
211
<  Fermi (or HOMO) energy (hartree) =  -0.09171   Average Vxc (hartree)=  -0.32386
>  Fermi (or HOMO) energy (hartree) =   0.04698   Average Vxc (hartree)=  -0.32597
214
<   -0.81296   -0.47940   -0.40280   -0.38621   -0.33010   -0.29020   -0.13508
>   -0.70867   -0.31862    0.00000    0.00000    0.00000    0.00042    0.01726
216
<   -0.81834   -0.48453   -0.45154   -0.37558   -0.35487   -0.31470   -0.09171
>   -0.55458   -0.21241    0.00000    0.00000    0.00000    0.00172    0.04698
217
<  Fermi (or HOMO) energy (eV) =  -2.49552   Average Vxc (eV)=  -8.81280
>  Fermi (or HOMO) energy (eV) =   1.27840   Average Vxc (eV)=  -8.87007
220
<  -22.12165  -13.04502  -10.96066  -10.50918   -8.98258   -7.89688   -3.67561
>  -19.28399   -8.67023    0.00000    0.00000    0.00000    0.01153    0.46962
222
<  -22.26807  -13.18474  -12.28698  -10.22016   -9.65640   -8.56333   -2.49552
>  -15.09085   -5.77999    0.00000    0.00000    0.00000    0.04674    1.27840
224
<  ETOT  2  -33.753967030558    -1.609E+00 2.489E-02 4.539E+01
>  ETOT  2  -33.742599336313    -1.709E+00 4.567E-02 5.285E+01
227
<  Fermi (or HOMO) energy (hartree) =  -0.31394   Average Vxc (hartree)=  -0.28713
>  Fermi (or HOMO) energy (hartree) =   0.03747   Average Vxc (hartree)=  -0.28607
230
<   -0.76285   -0.70359   -0.70304   -0.70000   -0.32761   -0.31692   -0.31394
>   -0.74231   -0.49367    0.00000    0.00000    0.00000    0.00000    0.03747
232
<   -0.76125   -0.70569   -0.70382   -0.70077   -0.33349   -0.32170   -0.31985
>   -0.61086   -0.49028    0.00000    0.00000    0.00000    0.00841    0.00945
233
<  Fermi (or HOMO) energy (eV) =  -8.54262   Average Vxc (eV)=  -7.81322
>  Fermi (or HOMO) energy (eV) =   1.01964   Average Vxc (eV)=  -7.78430
236
<  -20.75817  -19.14553  -19.13072  -19.04794   -8.91472   -8.62394   -8.54262
>  -20.19934  -13.43336    0.00000    0.00000    0.00000    0.00000    1.01964
238
<  -20.71471  -19.20279  -19.15196  -19.06899   -9.07466   -8.75393   -8.70365
>  -16.62244  -13.34133    0.00000    0.00000    0.00000    0.22890    0.25711
240
[...]
635
<  band_energy         : -6.32824180479714E+00
>  band_energy         : -1.92065546421853E+00
657
<   sigma(1 1)=  3.86327862E-04  sigma(3 2)=  0.00000000E+00
>   sigma(1 1)=  3.86333051E-04  sigma(3 2)=  0.00000000E+00
658
<   sigma(2 2)=  3.86327862E-04  sigma(3 1)=  0.00000000E+00
>   sigma(2 2)=  3.86333051E-04  sigma(3 1)=  0.00000000E+00
659
<   sigma(3 3)=  3.86327862E-04  sigma(2 1)=  0.00000000E+00
>   sigma(3 3)=  3.86333051E-04  sigma(2 1)=  0.00000000E+00
702
<            strten      3.8632786150E-04  3.8632786150E-04  3.8632786150E-04
>            strten      3.8633305104E-04  3.8633305104E-04  3.8633305104E-04
1091
< +Overall time at end (sec) : cpu=          2.1  wall=          2.7
> +Overall time at end (sec) : cpu=         12.6  wall=         12.6
Summary t24.out: different lines=172, max abs_diff=1.915e+01 (l.238), max rel_diff=1.000e+00 (l.211).

Is this a known issue, is there some work-around? Debian does not compile MIPS differently to, say, ARM or Intel/AMD.

For t11 there is no convergence; also the fftalg is different, could that be relevant?

--- fast/Refs/t11.out   2020-11-10 12:21:53.000000000 +0000
+++ Test_suite/fast_t03-t05-t06-t07-t08-t09-t11-t12-t14-t16/t11.out     2021-02-22 12:42:16.524181381 +0000
[...]
--          fftalg         312
+-          fftalg         112
[...]
@@ -150,37 +150,50 @@
 ================================================================================
  prteigrs : about to open file t11o_EIG
  Non-SCF case, kpt    1 (  0.00000  0.00000  0.00000), residuals and eigenvalues=
-  1.32E-13  2.40E-13  2.23E-13  2.32E-13  1.20E-13  1.48E-13  1.23E-13  6.81E-13
- -2.2607E-01  2.1504E-01  2.1504E-01  2.1504E-01  3.0745E-01  3.0745E-01
-  3.0745E-01  3.2917E-01
+  2.69E-09  6.65E-04  2.13E-03  7.52E-04  5.52E-04  2.20E-03  4.56E-04  1.17E-04
+ -2.2607E-01 -2.7122E-03 -2.8771E-04  0.0000E+00  0.0000E+00  2.2553E-01
+  2.4413E-01  3.2012E-01
+  prteigrs : nnsclo,ikpt=   20    1 max resid (incl. the buffer)=  2.19810E-03
  Non-SCF case, kpt    2 (  0.00000  0.50000  0.50000), residuals and eigenvalues=
-  6.05E-14  3.60E-13  4.78E-14  5.51E-14  5.11E-13  5.09E-13  7.46E-13  1.70E-13
- -7.3665E-02 -7.3665E-02  1.0775E-01  1.0775E-01  2.3725E-01  2.3725E-01
-  5.7958E-01  5.7958E-01
+  3.99E-21  1.76E-03  6.60E-03  3.34E-03  7.73E-03  3.44E-03  1.58E-02  9.46E-07
+ -7.3665E-02 -7.0495E-02 -1.0501E-06 -4.0302E-17  1.4324E-01  2.4859E-01
+  5.4141E-01  5.7958E-01
+  prteigrs : nnsclo,ikpt=   20    2 max resid (incl. the buffer)=  1.58493E-02
[...]
  Non-SCF case, kpt    8 (  0.50000  0.00000  0.00000), residuals and eigenvalues=
-  3.25E-13  5.23E-13  1.79E-13  4.55E-13  7.88E-14  3.97E-13  4.23E-13  7.41E-13
- -1.3943E-01 -4.4132E-02  1.6952E-01  1.6952E-01  2.6640E-01  3.3740E-01
-  3.3740E-01  4.8989E-01
+  1.93E-11  1.22E-09  3.47E-03  1.68E-03  1.60E-03  5.80E-04  6.20E-04  2.71E-18
+ -1.3943E-01 -4.4132E-02 -1.3892E-17 -1.0491E-18  1.2548E-19  1.1470E-17
+  3.2797E-01  4.8989E-01
+  prteigrs : nnsclo,ikpt=   20    8 max resid (incl. the buffer)=  3.47131E-03
+
+
+ scprqt:  WARNING -
+  nstep=   20 was not enough non-SCF iterations to converge;
+  maximum residual=  6.151E-02 exceeds tolwfr=  1.000E-12
 
 
 --- !ResultsGS
@@ -193,7 +206,7 @@
 lattice_lengths: [   7.25711,    7.25711,    7.25711, ]
 lattice_angles: [ 60.000,  60.000,  60.000, ] # degrees, (23, 13, 12)
 lattice_volume:   2.7025701E+02
-convergence: {deltae:  0.000E+00, res2:  0.000E+00, residm:  9.768E-13, diffor:  0.000E+00, }
+convergence: {deltae:  0.000E+00, res2:  0.000E+00, residm:  6.151E-02, diffor:  0.000E+00, }

For t11 there is no convergence; also the fftalg is different, could that be relevant?

fftalg 312 corresponds to an external FFTW3 library while fftalg 112 is the internal FFT Fortran library shipped with Abinit. The reference files have been produced using our reference machine and an external FFTW3 compiled with gcc/gfortran.
In principle the results of the tests should not depend on the FFT library unless there's a serious portability problem either in the external library or in the internal version.
(on our test farm we systematically test FFTW3, MKL-DFTI and the internal FFT Fortran routines).

Can you run:

runtests.py unitary

to execute the unit tests for the different FFT interfaces?

If these tests are OK, I would say that fftalg 112 works as expected and the origin of the problem should be found somewhere else.

Note that Abinit v9 significantly differs from v8 both at the level of the build systems as well as the level of the internal implementation.
I had a look at the log file for mips64el and I didn't spot any significant configuration problem. We also have a buildbot worker that tests gnu 10.2

@jmbeuken

I have to backtrack here; I've tried abinit-8.10.3 with the same toolchain (Debian unstable), and tests fail there on MIPS already (that was fine back in September, see https://buildd.debian.org/status/fetch.php?pkg=abinit&arch=mips64el&ver=8.10.3-3&stamp=1601210217&raw=0)

So it seems something in newer compilers (gfortran-10.2?) etc. Will have to dig deeper.