9.2.2 testsuite failures on MIPS architectures
mbanck opened this issue · comments
The 9.2.2 has introduced a testsuite regression on MIPS (32bit or 64bit little endian) architectures, see https://buildd.debian.org/status/package.php?p=abinit
In version 8.10.3 the testsuites still passed, see https://buildd.debian.org/status/logs.php?pkg=abinit&arch=mips64el and https://buildd.debian.org/status/fetch.php?pkg=abinit&arch=mips64el&ver=8.10.3-3&stamp=1601210217&raw=0
Debian runs runtests.py fast
as test-suite, the fldiff of e.g. t24
is this:
# YAML support is available, but is disabled for this test.
# Start legacy fldiff comparison report
2
< .Version 9.0.0 of ABINIT
> .Version 9.2.2 of ABINIT
3
< .(MPI version, prepared for a x86_64_linux_gnu9.2 computer)
> .(MPI version, prepared for a mipsel_linux_gnu10.2 computer)
17
< .Starting date : Mon 24 Feb 2020.
> .Starting date : Mon 22 Feb 2021.
208
< ETOT 1 -32.145013381823 -3.215E+01 2.148E-01 5.370E+02
> ETOT 1 -32.033288641759 -3.203E+01 2.219E-01 5.278E+02
211
< Fermi (or HOMO) energy (hartree) = -0.09171 Average Vxc (hartree)= -0.32386
> Fermi (or HOMO) energy (hartree) = 0.04698 Average Vxc (hartree)= -0.32597
214
< -0.81296 -0.47940 -0.40280 -0.38621 -0.33010 -0.29020 -0.13508
> -0.70867 -0.31862 0.00000 0.00000 0.00000 0.00042 0.01726
216
< -0.81834 -0.48453 -0.45154 -0.37558 -0.35487 -0.31470 -0.09171
> -0.55458 -0.21241 0.00000 0.00000 0.00000 0.00172 0.04698
217
< Fermi (or HOMO) energy (eV) = -2.49552 Average Vxc (eV)= -8.81280
> Fermi (or HOMO) energy (eV) = 1.27840 Average Vxc (eV)= -8.87007
220
< -22.12165 -13.04502 -10.96066 -10.50918 -8.98258 -7.89688 -3.67561
> -19.28399 -8.67023 0.00000 0.00000 0.00000 0.01153 0.46962
222
< -22.26807 -13.18474 -12.28698 -10.22016 -9.65640 -8.56333 -2.49552
> -15.09085 -5.77999 0.00000 0.00000 0.00000 0.04674 1.27840
224
< ETOT 2 -33.753967030558 -1.609E+00 2.489E-02 4.539E+01
> ETOT 2 -33.742599336313 -1.709E+00 4.567E-02 5.285E+01
227
< Fermi (or HOMO) energy (hartree) = -0.31394 Average Vxc (hartree)= -0.28713
> Fermi (or HOMO) energy (hartree) = 0.03747 Average Vxc (hartree)= -0.28607
230
< -0.76285 -0.70359 -0.70304 -0.70000 -0.32761 -0.31692 -0.31394
> -0.74231 -0.49367 0.00000 0.00000 0.00000 0.00000 0.03747
232
< -0.76125 -0.70569 -0.70382 -0.70077 -0.33349 -0.32170 -0.31985
> -0.61086 -0.49028 0.00000 0.00000 0.00000 0.00841 0.00945
233
< Fermi (or HOMO) energy (eV) = -8.54262 Average Vxc (eV)= -7.81322
> Fermi (or HOMO) energy (eV) = 1.01964 Average Vxc (eV)= -7.78430
236
< -20.75817 -19.14553 -19.13072 -19.04794 -8.91472 -8.62394 -8.54262
> -20.19934 -13.43336 0.00000 0.00000 0.00000 0.00000 1.01964
238
< -20.71471 -19.20279 -19.15196 -19.06899 -9.07466 -8.75393 -8.70365
> -16.62244 -13.34133 0.00000 0.00000 0.00000 0.22890 0.25711
240
[...]
635
< band_energy : -6.32824180479714E+00
> band_energy : -1.92065546421853E+00
657
< sigma(1 1)= 3.86327862E-04 sigma(3 2)= 0.00000000E+00
> sigma(1 1)= 3.86333051E-04 sigma(3 2)= 0.00000000E+00
658
< sigma(2 2)= 3.86327862E-04 sigma(3 1)= 0.00000000E+00
> sigma(2 2)= 3.86333051E-04 sigma(3 1)= 0.00000000E+00
659
< sigma(3 3)= 3.86327862E-04 sigma(2 1)= 0.00000000E+00
> sigma(3 3)= 3.86333051E-04 sigma(2 1)= 0.00000000E+00
702
< strten 3.8632786150E-04 3.8632786150E-04 3.8632786150E-04
> strten 3.8633305104E-04 3.8633305104E-04 3.8633305104E-04
1091
< +Overall time at end (sec) : cpu= 2.1 wall= 2.7
> +Overall time at end (sec) : cpu= 12.6 wall= 12.6
Summary t24.out: different lines=172, max abs_diff=1.915e+01 (l.238), max rel_diff=1.000e+00 (l.211).
Is this a known issue, is there some work-around? Debian does not compile MIPS differently to, say, ARM or Intel/AMD.
For t11
there is no convergence; also the fftalg
is different, could that be relevant?
--- fast/Refs/t11.out 2020-11-10 12:21:53.000000000 +0000
+++ Test_suite/fast_t03-t05-t06-t07-t08-t09-t11-t12-t14-t16/t11.out 2021-02-22 12:42:16.524181381 +0000
[...]
-- fftalg 312
+- fftalg 112
[...]
@@ -150,37 +150,50 @@
================================================================================
prteigrs : about to open file t11o_EIG
Non-SCF case, kpt 1 ( 0.00000 0.00000 0.00000), residuals and eigenvalues=
- 1.32E-13 2.40E-13 2.23E-13 2.32E-13 1.20E-13 1.48E-13 1.23E-13 6.81E-13
- -2.2607E-01 2.1504E-01 2.1504E-01 2.1504E-01 3.0745E-01 3.0745E-01
- 3.0745E-01 3.2917E-01
+ 2.69E-09 6.65E-04 2.13E-03 7.52E-04 5.52E-04 2.20E-03 4.56E-04 1.17E-04
+ -2.2607E-01 -2.7122E-03 -2.8771E-04 0.0000E+00 0.0000E+00 2.2553E-01
+ 2.4413E-01 3.2012E-01
+ prteigrs : nnsclo,ikpt= 20 1 max resid (incl. the buffer)= 2.19810E-03
Non-SCF case, kpt 2 ( 0.00000 0.50000 0.50000), residuals and eigenvalues=
- 6.05E-14 3.60E-13 4.78E-14 5.51E-14 5.11E-13 5.09E-13 7.46E-13 1.70E-13
- -7.3665E-02 -7.3665E-02 1.0775E-01 1.0775E-01 2.3725E-01 2.3725E-01
- 5.7958E-01 5.7958E-01
+ 3.99E-21 1.76E-03 6.60E-03 3.34E-03 7.73E-03 3.44E-03 1.58E-02 9.46E-07
+ -7.3665E-02 -7.0495E-02 -1.0501E-06 -4.0302E-17 1.4324E-01 2.4859E-01
+ 5.4141E-01 5.7958E-01
+ prteigrs : nnsclo,ikpt= 20 2 max resid (incl. the buffer)= 1.58493E-02
[...]
Non-SCF case, kpt 8 ( 0.50000 0.00000 0.00000), residuals and eigenvalues=
- 3.25E-13 5.23E-13 1.79E-13 4.55E-13 7.88E-14 3.97E-13 4.23E-13 7.41E-13
- -1.3943E-01 -4.4132E-02 1.6952E-01 1.6952E-01 2.6640E-01 3.3740E-01
- 3.3740E-01 4.8989E-01
+ 1.93E-11 1.22E-09 3.47E-03 1.68E-03 1.60E-03 5.80E-04 6.20E-04 2.71E-18
+ -1.3943E-01 -4.4132E-02 -1.3892E-17 -1.0491E-18 1.2548E-19 1.1470E-17
+ 3.2797E-01 4.8989E-01
+ prteigrs : nnsclo,ikpt= 20 8 max resid (incl. the buffer)= 3.47131E-03
+
+
+ scprqt: WARNING -
+ nstep= 20 was not enough non-SCF iterations to converge;
+ maximum residual= 6.151E-02 exceeds tolwfr= 1.000E-12
--- !ResultsGS
@@ -193,7 +206,7 @@
lattice_lengths: [ 7.25711, 7.25711, 7.25711, ]
lattice_angles: [ 60.000, 60.000, 60.000, ] # degrees, (23, 13, 12)
lattice_volume: 2.7025701E+02
-convergence: {deltae: 0.000E+00, res2: 0.000E+00, residm: 9.768E-13, diffor: 0.000E+00, }
+convergence: {deltae: 0.000E+00, res2: 0.000E+00, residm: 6.151E-02, diffor: 0.000E+00, }
For t11 there is no convergence; also the fftalg is different, could that be relevant?
fftalg 312 corresponds to an external FFTW3 library while fftalg 112 is the internal FFT Fortran library shipped with Abinit. The reference files have been produced using our reference machine and an external FFTW3 compiled with gcc/gfortran.
In principle the results of the tests should not depend on the FFT library unless there's a serious portability problem either in the external library or in the internal version.
(on our test farm we systematically test FFTW3, MKL-DFTI and the internal FFT Fortran routines).
Can you run:
runtests.py unitary
to execute the unit tests for the different FFT interfaces?
If these tests are OK, I would say that fftalg 112 works as expected and the origin of the problem should be found somewhere else.
Note that Abinit v9 significantly differs from v8 both at the level of the build systems as well as the level of the internal implementation.
I had a look at the log file for mips64el and I didn't spot any significant configuration problem. We also have a buildbot worker that tests gnu 10.2
I have to backtrack here; I've tried abinit-8.10.3 with the same toolchain (Debian unstable), and tests fail there on MIPS already (that was fine back in September, see https://buildd.debian.org/status/fetch.php?pkg=abinit&arch=mips64el&ver=8.10.3-3&stamp=1601210217&raw=0)
So it seems something in newer compilers (gfortran-10.2?) etc. Will have to dig deeper.