mlpack / ensmallen

A header-only C++ library for numerical optimization --

Home Page:http://ensmallen.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

AdamSchafferFunctionN2Test fails on aarch64

ggardet opened this issue · comments

Issue description

AdamSchafferFunctionN2Test fails on aarch64:

[  350s] + ./ensmallen_tests
[  350s] ensmallen version: 2.16.0 (Severely Dented Can Of Polyurethane)
[  350s] armadillo version: 10.2.0 (Cicada Swarm)
[  832s] 
[  832s] ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[  832s] ensmallen_tests is a Catch v2.4.1 host application.
[  832s] Run with -? for options
[  832s] 
[  832s] -------------------------------------------------------------------------------
[  832s] AdamSchafferFunctionN2Test
[  832s] -------------------------------------------------------------------------------
[  832s] /home/abuild/rpmbuild/BUILD/ensmallen-2.16.0/tests/adam_test.cpp:331
[  832s] ...............................................................................
[  832s] 
[  832s] /home/abuild/rpmbuild/BUILD/ensmallen-2.16.0/tests/test_function_tools.hpp:114: FAILED:
[  832s]   REQUIRE( objective == Approx(expectedObjective).margin(objectiveMargin) )
[  832s] with expansion:
[  832s]   0.4988660447 == Approx( 0.0 )
[  832s] 
[  835s] ===============================================================================
[  835s] test cases:   274 |   273 passed | 1 failed
[  835s] assertions: 13331 | 13330 passed | 1 failed

Your environment

  • version of ensmallen: 2.16.0
  • operating system: openSUSE Tumbleweed
  • compiler: gcc10
  • version of Armadillo: 10.2.0 (+blas 3.8.0)
  • any other environment information you think is relevant:

Steps to reproduce

Build for openSUSE Tumbleweed aarch64 and run the tests.

Expected behavior

All tests should pass.

Actual behavior

AdamSchafferFunctionN2Test fails on aarch64.

I tried to reproduce this locally, but was not able to on an x86_64 system (with many different random seeds). So this looks like an aarch64-specific issue; I'll have to try and reproduce it there to see what's going on when I have a chance. Thanks for the report!

I tried to reproduce in an aarch64 docker container yesterday (via qemu) but no success yet. I'll keep trying later today.

I will keep an eye on this issue too, this will be easier to reproduce once #2531 get merged

I ran all tests on Raspberry Pi 400, with no errors reported.
Armadillo was linked against OpenBLAS.0.3.5. A different result may occur if standard BLAS is used.

(Raspberry Pi 400 has BCM2711, which is apparently Cortex-A72 (ARM v8), but /proc/cpuinfo reports it as "ARMv7 Processor")

I ran all tests on Raspberry Pi 400, with no errors reported.
Armadillo was linked against OpenBLAS.0.3.5. A different result may occur if standard BLAS is used.

(Raspberry Pi 400 has BCM2711, which is apparently Cortex-A72 (ARM v8), but /proc/cpuinfo reports it as "ARMv7 Processor")

You are using an armv7 distro, so no aarch64 for you.

I've been trying to reproduce this in an arm64-on-x86-64 docker container, but with no success. It will be a little while until I am able to try this on actual arm64 hardware, so if someone else beats me to it, please feel free to debug further. :)

@rcurtin If you tell me how to instrument this more thoroughly I could upload a version that gives more information on all the Debian build architecture.

@barak Can you clarify what you mean by "how to instrument this more thoroughly" ? We're all time-constrained, so a list of straightforward steps would be helpful.

@conradsnicta Sure. Right now, there's a stanza in the debian/rules build script

override_dh_auto_test:
        env CTEST_OUTPUT_ON_FAILURE=1 dh_auto_test

where dh_auto_test basically does make test. That gets run during the build process on all architectures. And if the test fails, the build is considered to have failed. The entire transcript is made available (URLs above.)

I can put other commands there. E.g., set it up so if the test fails it runs some other script that generates voluminous output. Whatever makes sense. Or the test can be run multiple times if there's some kind of Heisenbug going on. If you can think of anything that I could put there in order to give more information that might prove helpful, I'd be very happy to. This has the advantage of running on a bunch of weird architectures, and of being automatic, and having a good chance of catching regressions.

It's hard for me to know what kinds of output would actually help figure out what's wrong here. Honestly I think I would need to play with it and step through what was going wrong. I tried to reproduce this on an arm64 system and I also had no success there.

@barak do you think you could try a build where you ran the tests twice (or even three times)? It so happens that in ensmallen 2.16.0, I accidentally committed code that chooses a different random seed. So if this is a random 'unlucky' failure (which... I don't think it is), we might not see it every time.

Sure. What exactly should I do? Just "make CTEST_OUTPUT_ON_FAILURE=1 test" three times?

Yeah, let's give that a shot and see what happens...

Okay! Uploaded, we'll see how it goes.
I have it set up to run three times and then fail if any of the test runs failed.

Built. On some failing architectures it fails all three times the same way. But on others (like armhf) it sometimes fails and sometimes succeeds!

Check it out: https://buildd.debian.org/status/package.php?p=ensmallen

Just checked the gradient for the Schaffer N2 function, and got another result maybe the gradient implementation is wrong, I'll check it once more.

@barak can we run the tests on a specific PR as well? The gradient is correct but the Adam update rule uses an approximation, the difference is marginal but wondering if that causes an issue.

Uploaded 2.16.1-1 last night. Still fails on some architectures, although now it seems consistent:

-------------------------------------------------------------------------------
AdamSchafferFunctionN2Test
-------------------------------------------------------------------------------
./tests/adam_test.cpp:331
...............................................................................

./tests/test_function_tools.hpp:114: FAILED:
  REQUIRE( objective == Approx(expectedObjective).margin(objectiveMargin) )
with expansion:
  0.4988660447 == Approx( 0.0 )

The failing architectures are arm64, ppc64el, and s360x.

Not sure what you mean by running it on a PR. I can upload a version containing some extra commits if you'd like.

Uploaded 2.16.1-1 last night. Still fails on some architectures, although now it seems consistent:

-------------------------------------------------------------------------------
AdamSchafferFunctionN2Test
-------------------------------------------------------------------------------
./tests/adam_test.cpp:331
...............................................................................

./tests/test_function_tools.hpp:114: FAILED:
  REQUIRE( objective == Approx(expectedObjective).margin(objectiveMargin) )
with expansion:
  0.4988660447 == Approx( 0.0 )

The failing architectures are arm64, ppc64el, and s360x.

Not sure what you mean by running it on a PR. I can upload a version containing some extra commits if you'd like.

Can you run the tests against #265?

Uploaded to debian; will see how it goes.

Uploaded to debian; will see how it goes.

Thanks!

Still fails on some architectures but not others.

Still fails on some architectures but not others.

Can you post the link to the report?

With #265 I still get a failure on aarch64:

[  974s] -------------------------------------------------------------------------------
[  974s] AdamSchafferFunctionN2Test
[  974s] -------------------------------------------------------------------------------
[  974s] /home/abuild/rpmbuild/BUILD/ensmallen-2.16.0/tests/adam_test.cpp:331
[  974s] ...............................................................................
[  974s] 
[  974s] /home/abuild/rpmbuild/BUILD/ensmallen-2.16.0/tests/test_function_tools.hpp:114: FAILED:
[  974s]   REQUIRE( objective == Approx(expectedObjective).margin(objectiveMargin) )
[  974s] with expansion:
[  974s]   0.4988660434 == Approx( 0.0 )
[  974s] 
[  976s] ===============================================================================
[  976s] test cases:   274 |   273 passed | 1 failed
[  976s] assertions: 13331 | 13330 passed | 1 failed

@zoq Sure. The link above, https://buildd.debian.org/status/package.php?p=ensmallen, leads to the latest Debian build logs for the package on all architectures.

@zoq Sure. The link above, https://buildd.debian.org/status/package.php?p=ensmallen, leads to the latest Debian build logs for the package on all architectures.

Thanks, I missed that one.

@barak @ggardet do we have any more information about the hardware?

@zoq https://db.debian.org/machines.cgi for what it's worth. What particular features are you looking for. RAM perhaps?

@barak @ggardet do we have any more information about the hardware?

On my side this is inside aarch64 qemu/kvm VM.

I believe these are not virtual but rather physical instances of each architecture, with the builds run in a chroot sandbox containing the minimal build prerequisites.

Ok, so we ended up deciding to simply remove the test in #265 because we don't have any reason to believe that anything is actually broken on arm64 (given all the other passing test cases for Adam), and also because we found a comment indicating that the plan was to remove the test anyway. 😄

So, now ensmallen 2.16.2 is released... and hopefully should work on arm64 and all of the various architectures Debian tests against. :) Want to give it a shot and see what happens?

uploaded! (although it's unlikely it'll make it into the stable release because it's too much of a delta for so late in the release process)

I confirm that version 2.16.2 is perfectly fine on openSUSE Tumbleweed aarch64. Thanks!

I confirm that version 2.16.2 is perfectly fine on openSUSE Tumbleweed aarch64. Thanks!

Great, thanks for the info.

Awesome, now let's just wait to see if @barak has success too, and if so, I think we can happily close this issue. :)

This issue has been automatically marked as stale because it has not had any recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions! 👍

All seems well!

Awesome! I will go ahead and close this then, and hopefully there won't be more issues in the future. 😄