AdamSchafferFunctionN2Test fails on aarch64

Question

AdamSchafferFunctionN2Test fails on aarch64

ggardet opened this issue 3 years ago · comments

Issue description

AdamSchafferFunctionN2Test fails on aarch64:

[  350s] + ./ensmallen_tests
[  350s] ensmallen version: 2.16.0 (Severely Dented Can Of Polyurethane)
[  350s] armadillo version: 10.2.0 (Cicada Swarm)
[  832s] 
[  832s] ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[  832s] ensmallen_tests is a Catch v2.4.1 host application.
[  832s] Run with -? for options
[  832s] 
[  832s] -------------------------------------------------------------------------------
[  832s] AdamSchafferFunctionN2Test
[  832s] -------------------------------------------------------------------------------
[  832s] /home/abuild/rpmbuild/BUILD/ensmallen-2.16.0/tests/adam_test.cpp:331
[  832s] ...............................................................................
[  832s] 
[  832s] /home/abuild/rpmbuild/BUILD/ensmallen-2.16.0/tests/test_function_tools.hpp:114: FAILED:
[  832s]   REQUIRE( objective == Approx(expectedObjective).margin(objectiveMargin) )
[  832s] with expansion:
[  832s]   0.4988660447 == Approx( 0.0 )
[  832s] 
[  835s] ===============================================================================
[  835s] test cases:   274 |   273 passed | 1 failed
[  835s] assertions: 13331 | 13330 passed | 1 failed

Your environment

version of ensmallen: 2.16.0
operating system: openSUSE Tumbleweed
compiler: gcc10
version of Armadillo: 10.2.0 (+blas 3.8.0)
any other environment information you think is relevant:

Steps to reproduce

Build for openSUSE Tumbleweed aarch64 and run the tests.

Expected behavior

All tests should pass.

Actual behavior

AdamSchafferFunctionN2Test fails on aarch64.

Ryan Curtin · Answer 1 · Thu Feb 18 2021 10:39:05 GMT+0800 (China Standard Time)

I tried to reproduce this locally, but was not able to on an x86_64 system (with many different random seeds). So this looks like an aarch64-specific issue; I'll have to try and reproduce it there to see what's going on when I have a chance. Thanks for the report!

Barak A. Pearlmutter · Answer 2 · Sat Feb 20 2021 23:26:12 GMT+0800 (China Standard Time)

Happens on a bunch of architectures. See https://buildd.debian.org/status/package.php?p=ensmallen

Ryan Curtin · Answer 3 · Mon Feb 22 2021 20:05:42 GMT+0800 (China Standard Time)

I tried to reproduce in an aarch64 docker container yesterday (via qemu) but no success yet. I'll keep trying later today.

Omar Shrit · Answer 4 · Mon Feb 22 2021 22:15:46 GMT+0800 (China Standard Time)

I will keep an eye on this issue too, this will be easier to reproduce once #2531 get merged

conradsnicta · Answer 5 · Mon Feb 22 2021 22:26:45 GMT+0800 (China Standard Time)

I ran all tests on Raspberry Pi 400, with no errors reported.
Armadillo was linked against OpenBLAS.0.3.5. A different result may occur if standard BLAS is used.

(Raspberry Pi 400 has BCM2711, which is apparently Cortex-A72 (ARM v8), but /proc/cpuinfo reports it as "ARMv7 Processor")

ggardet · Answer 6 · Tue Feb 23 2021 01:07:21 GMT+0800 (China Standard Time)

I ran all tests on Raspberry Pi 400, with no errors reported.
Armadillo was linked against OpenBLAS.0.3.5. A different result may occur if standard BLAS is used.

(Raspberry Pi 400 has BCM2711, which is apparently Cortex-A72 (ARM v8), but /proc/cpuinfo reports it as "ARMv7 Processor")

You are using an armv7 distro, so no aarch64 for you.

Ryan Curtin · Answer 7 · Wed Feb 24 2021 10:44:03 GMT+0800 (China Standard Time)

I've been trying to reproduce this in an arm64-on-x86-64 docker container, but with no success. It will be a little while until I am able to try this on actual arm64 hardware, so if someone else beats me to it, please feel free to debug further. :)

Barak A. Pearlmutter · Answer 8 · Wed Feb 24 2021 18:43:12 GMT+0800 (China Standard Time)

@rcurtin If you tell me how to instrument this more thoroughly I could upload a version that gives more information on all the Debian build architecture.

conradsnicta · Answer 9 · Thu Feb 25 2021 11:53:06 GMT+0800 (China Standard Time)

@barak Can you clarify what you mean by "how to instrument this more thoroughly" ? We're all time-constrained, so a list of straightforward steps would be helpful.

Barak A. Pearlmutter · Answer 10 · Thu Feb 25 2021 19:07:45 GMT+0800 (China Standard Time)

@conradsnicta Sure. Right now, there's a stanza in the debian/rules build script

override_dh_auto_test:
        env CTEST_OUTPUT_ON_FAILURE=1 dh_auto_test

where dh_auto_test basically does make test. That gets run during the build process on all architectures. And if the test fails, the build is considered to have failed. The entire transcript is made available (URLs above.)

I can put other commands there. E.g., set it up so if the test fails it runs some other script that generates voluminous output. Whatever makes sense. Or the test can be run multiple times if there's some kind of Heisenbug going on. If you can think of anything that I could put there in order to give more information that might prove helpful, I'd be very happy to. This has the advantage of running on a bunch of weird architectures, and of being automatic, and having a good chance of catching regressions.

Ryan Curtin · Answer 11 · Sun Feb 28 2021 06:11:14 GMT+0800 (China Standard Time)

It's hard for me to know what kinds of output would actually help figure out what's wrong here. Honestly I think I would need to play with it and step through what was going wrong. I tried to reproduce this on an arm64 system and I also had no success there.

@barak do you think you could try a build where you ran the tests twice (or even three times)? It so happens that in ensmallen 2.16.0, I accidentally committed code that chooses a different random seed. So if this is a random 'unlucky' failure (which... I don't think it is), we might not see it every time.

Barak A. Pearlmutter · Answer 12 · Sun Feb 28 2021 23:50:58 GMT+0800 (China Standard Time)

Sure. What exactly should I do? Just "make CTEST_OUTPUT_ON_FAILURE=1 test" three times?

Ryan Curtin · Answer 13 · Thu Mar 04 2021 06:05:35 GMT+0800 (China Standard Time)

Yeah, let's give that a shot and see what happens...

Barak A. Pearlmutter · Answer 14 · Thu Mar 04 2021 09:39:59 GMT+0800 (China Standard Time)

Okay! Uploaded, we'll see how it goes.
I have it set up to run three times and then fail if any of the test runs failed.

Barak A. Pearlmutter · Answer 15 · Thu Mar 04 2021 18:00:57 GMT+0800 (China Standard Time)

Built. On some failing architectures it fails all three times the same way. But on others (like armhf) it sometimes fails and sometimes succeeds!

Check it out: https://buildd.debian.org/status/package.php?p=ensmallen

Marcus Edel · Answer 16 · Sat Mar 06 2021 00:59:19 GMT+0800 (China Standard Time)

Just checked the gradient for the Schaffer N2 function, and got another result maybe the gradient implementation is wrong, I'll check it once more.

Marcus Edel · Answer 17 · Sat Mar 06 2021 05:29:50 GMT+0800 (China Standard Time)

@barak can we run the tests on a specific PR as well? The gradient is correct but the Adam update rule uses an approximation, the difference is marginal but wondering if that causes an issue.

Barak A. Pearlmutter · Answer 18 · Sat Mar 06 2021 16:58:38 GMT+0800 (China Standard Time)

Uploaded 2.16.1-1 last night. Still fails on some architectures, although now it seems consistent:

-------------------------------------------------------------------------------
AdamSchafferFunctionN2Test
-------------------------------------------------------------------------------
./tests/adam_test.cpp:331
...............................................................................

./tests/test_function_tools.hpp:114: FAILED:
  REQUIRE( objective == Approx(expectedObjective).margin(objectiveMargin) )
with expansion:
  0.4988660447 == Approx( 0.0 )

The failing architectures are arm64, ppc64el, and s360x.

Not sure what you mean by running it on a PR. I can upload a version containing some extra commits if you'd like.

Marcus Edel · Answer 19 · Sun Mar 07 2021 07:50:13 GMT+0800 (China Standard Time)

Uploaded 2.16.1-1 last night. Still fails on some architectures, although now it seems consistent:
-------------------------------------------------------------------------------
AdamSchafferFunctionN2Test
-------------------------------------------------------------------------------
./tests/adam_test.cpp:331
...............................................................................

./tests/test_function_tools.hpp:114: FAILED:
  REQUIRE( objective == Approx(expectedObjective).margin(objectiveMargin) )
with expansion:
  0.4988660447 == Approx( 0.0 )
The failing architectures are arm64, ppc64el, and s360x.

Not sure what you mean by running it on a PR. I can upload a version containing some extra commits if you'd like.

Can you run the tests against #265?

Barak A. Pearlmutter · Answer 20 · Mon Mar 08 2021 00:56:44 GMT+0800 (China Standard Time)

Uploaded to debian; will see how it goes.

Marcus Edel · Answer 21 · Mon Mar 08 2021 03:00:36 GMT+0800 (China Standard Time)

Uploaded to debian; will see how it goes.

Thanks!

Barak A. Pearlmutter · Answer 22 · Mon Mar 08 2021 03:39:46 GMT+0800 (China Standard Time)

Still fails on some architectures but not others.

Marcus Edel · Answer 23 · Mon Mar 08 2021 05:48:16 GMT+0800 (China Standard Time)

Still fails on some architectures but not others.

Can you post the link to the report?

ggardet · Answer 24 · Mon Mar 08 2021 17:56:55 GMT+0800 (China Standard Time)

With #265 I still get a failure on aarch64:

[  974s] -------------------------------------------------------------------------------
[  974s] AdamSchafferFunctionN2Test
[  974s] -------------------------------------------------------------------------------
[  974s] /home/abuild/rpmbuild/BUILD/ensmallen-2.16.0/tests/adam_test.cpp:331
[  974s] ...............................................................................
[  974s] 
[  974s] /home/abuild/rpmbuild/BUILD/ensmallen-2.16.0/tests/test_function_tools.hpp:114: FAILED:
[  974s]   REQUIRE( objective == Approx(expectedObjective).margin(objectiveMargin) )
[  974s] with expansion:
[  974s]   0.4988660434 == Approx( 0.0 )
[  974s] 
[  976s] ===============================================================================
[  976s] test cases:   274 |   273 passed | 1 failed
[  976s] assertions: 13331 | 13330 passed | 1 failed

Barak A. Pearlmutter · Answer 25 · Mon Mar 08 2021 20:45:58 GMT+0800 (China Standard Time)

@zoq Sure. The link above, https://buildd.debian.org/status/package.php?p=ensmallen, leads to the latest Debian build logs for the package on all architectures.

Marcus Edel · Answer 26 · Tue Mar 09 2021 01:16:09 GMT+0800 (China Standard Time)

@zoq Sure. The link above, https://buildd.debian.org/status/package.php?p=ensmallen, leads to the latest Debian build logs for the package on all architectures.

Thanks, I missed that one.

@barak @ggardet do we have any more information about the hardware?

Barak A. Pearlmutter · Answer 27 · Tue Mar 09 2021 03:56:35 GMT+0800 (China Standard Time)

@zoq https://db.debian.org/machines.cgi for what it's worth. What particular features are you looking for. RAM perhaps?

ggardet · Answer 28 · Tue Mar 09 2021 18:43:33 GMT+0800 (China Standard Time)

@barak @ggardet do we have any more information about the hardware?

On my side this is inside aarch64 qemu/kvm VM.

Barak A. Pearlmutter · Answer 29 · Wed Mar 10 2021 04:56:33 GMT+0800 (China Standard Time)

I believe these are not virtual but rather physical instances of each architecture, with the builds run in a chroot sandbox containing the minimal build prerequisites.

Ryan Curtin · Answer 30 · Fri Mar 26 2021 03:54:45 GMT+0800 (China Standard Time)

Ok, so we ended up deciding to simply remove the test in #265 because we don't have any reason to believe that anything is actually broken on arm64 (given all the other passing test cases for Adam), and also because we found a comment indicating that the plan was to remove the test anyway. 😄

So, now ensmallen 2.16.2 is released... and hopefully should work on arm64 and all of the various architectures Debian tests against. :) Want to give it a shot and see what happens?

Barak A. Pearlmutter · Answer 31 · Fri Mar 26 2021 06:25:23 GMT+0800 (China Standard Time)

uploaded! (although it's unlikely it'll make it into the stable release because it's too much of a delta for so late in the release process)

ggardet · Answer 32 · Fri Mar 26 2021 17:04:32 GMT+0800 (China Standard Time)

I confirm that version 2.16.2 is perfectly fine on openSUSE Tumbleweed aarch64. Thanks!

Marcus Edel · Answer 33 · Sun Mar 28 2021 01:05:04 GMT+0800 (China Standard Time)

I confirm that version 2.16.2 is perfectly fine on openSUSE Tumbleweed aarch64. Thanks!

Great, thanks for the info.

Ryan Curtin · Answer 34 · Sun Mar 28 2021 01:06:28 GMT+0800 (China Standard Time)

Awesome, now let's just wait to see if @barak has success too, and if so, I think we can happily close this issue. :)

mlpack-bot · Answer 35 · Tue Apr 27 2021 01:33:58 GMT+0800 (China Standard Time)

This issue has been automatically marked as stale because it has not had any recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions! 👍

Barak A. Pearlmutter · Answer 36 · Tue Apr 27 2021 15:05:41 GMT+0800 (China Standard Time)

All seems well!

Ryan Curtin · Answer 37 · Wed Apr 28 2021 06:01:15 GMT+0800 (China Standard Time)

Awesome! I will go ahead and close this then, and hopefully there won't be more issues in the future. 😄