dump_observer average integration test failing

Question

dump_observer average integration test failing

elindgren opened this issue 7 months ago · comments

The integration test for dump_observer:average is failing, with the predicted averages and forces almost being the same, but not quite.

I dug a bit deeper in this, and I've been able to determine a few things:

If the same potential is specified twice (i.e. potential nep0.txt twice in the run.in file), then the average is computed correctly (being identical to the prediction of one of the models).
If the same potential is copied to another file (i.e. potential nep0.txt and potential nep0_copy.txt in run.in) then the average is computed correctly.
If the parameters of the second potential, nep0_copy.txt, (either hyperparameters or weights) are modified, then the average is not computed correctly.
Changing the denominator in gpu_average_properties from the number of potentials to for instance 1.0 does not produce the expected result, except for when the same potential is specified twice.
The size of the relative error in the prediction seems to change with the number of atoms in the system, see the figure below.

Zheyong Fan · Answer 1 · Sun Feb 11 2024 18:11:27 GMT+0800 (China Standard Time)

Perhaps you can using binary search to pinpoint the change that causes the failure?

This PR (#495) has introduced a change for initial force, hence the trajectory. You can check if it is this one that is reponsible for the failure.

Eric Lindgren · Answer 2 · Mon Feb 12 2024 15:11:54 GMT+0800 (China Standard Time)

This PR (#495) has introduced a change for initial force, hence the trajectory. You can check if it is this one that is reponsible for the failure.

Yes I saw that the initial values had changed, and updating that fixed the issue for dump_observer observe but not dump_observer observe. So I think there is something more to it.

Perhaps you can using binary search to pinpoint the change that causes the failure?

I'm not sure I understand what you mean? Going through each commit until the test breaks?

Zheyong Fan · Answer 3 · Mon Feb 12 2024 15:43:49 GMT+0800 (China Standard Time)

yes, figure out when it breaks.

Eric Lindgren · Answer 4 · Fri Apr 12 2024 19:45:49 GMT+0800 (China Standard Time)

I used git bisect to find the faulty commit to be the following:

6f437f7d827a50702e30ce8d2be71b975ee1d1ba is the first bad commit
commit 6f437f7d827a50702e30ce8d2be71b975ee1d1ba
Author: psn417 <psn417@icloud.com>
Date:   Thu Sep 14 23:19:32 2023 +0800

    Add NPT, can run now

    still have problem

Eric Lindgren · Answer 5 · Fri Apr 12 2024 19:47:35 GMT+0800 (China Standard Time)

Here is a link to the commit: 6f437f7

Zheyong Fan · Answer 6 · Sat Apr 13 2024 02:02:54 GMT+0800 (China Standard Time)

Then it should be due to the fact that a force evaluation is added before the loop of integration in run.cu (which is correct).

You can try to comment out that force evaluation call and see if the regression test passes. If so you can update your reference data.

Eric Lindgren · Answer 7 · Mon Apr 15 2024 17:17:08 GMT+0800 (China Standard Time)

Then it should be due to the fact that a force evaluation is added before the loop of integration in run.cu (which is correct).

You can try to comment out that force evaluation call and see if the regression test passes. If so you can update your reference data.

Yes that was my first thought as well, but that is not the case. Updating the training data only fixes the test for the case of observer, but average still fails. I suspect there might be some extra force calculation somewhere that throws things out of sync.

Eric Lindgren · Answer 8 · Mon Apr 15 2024 19:45:52 GMT+0800 (China Standard Time)

I figured it out. I just needed to regenerate the reference data, but the way I did that was faulty. With the reference data being generated on the actual trajectory from dump_average, everything works as expected again. This means that the functionality was never broken; it was just the test that broke, which is good.