During AugLagrangian's optimization, L_BFGS sets coordinates to nan

Question

During AugLagrangian's optimization, L_BFGS sets coordinates to nan

olgavrou opened this issue a year ago · comments

Issue description

During AugLagrangian's optimization, L_BFGS sets coordinates to nan

I believe the issue is here

scalingFactor = dot(sMat, yMat) / dot(yMat, yMat);

if the two dot products are very small this leads to the scaling factor becoming a nan

In my experiment, if I replace that line with the code pasted below:

scalingFactor = dot(sMat, yMat) / std::max(1e-12, dot(yMat, yMat));

(EDITED)

or

double max_y = arma::abs(yMat).max();

if (max_y == 0.0)
{
  scalingFactor = 0.0;
}
else
{
  auto z = yMat / max_y;
  scalingFactor = (dot(sMat, z) / max_y) / dot(z, z);
}

which results in the optimizer working as expected.

Please let me know if you think this calculation is ok for the library and if you would be open to either patching it or accepting a patch.

Your environment

version of ensmallen: 2.19.1
Ubuntu 20.04:
compiler: gcc 9.4.0
version of Armadillo: 12.1.90
any other environment information you think is relevant:

Steps to reproduce

I have been iterating over the implementation of my constraint optimization problem you can see the latest version here and the behaviour is seen when this test (CheckUpdateRule500WIterations) is run

but unfortunately I don't have a smaller repro in hand

Expected behavior

Coordinates not getting set to nan

Actual behavior

Coordinates get set to nan

Ryan Curtin · Answer 1 · Mon May 15 2023 21:17:26 GMT+0800 (China Standard Time)

Thanks for the clear report! I agree that this is a problem and I like your suggested fixes. I might pick the first solution as it will still compute a nonzero scaling factor if || yMat || is very small, but I don't have a particularly strong opinion and could be convinced either way. If you'd like to open a PR I would gladly review it and we can get the fix merged. Thanks again! 👍

olgavrou · Answer 2 · Wed May 17 2023 02:16:08 GMT+0800 (China Standard Time)

OK will do :) I think the first solution is probably working better, I got another nan with the second one at some point

conradsnicta · Answer 3 · Mon May 22 2023 14:20:10 GMT+0800 (China Standard Time)

scalingFactor = dot(sMat, yMat) / std::max(1e-12, dot(yMat, yMat));

Instead of a hardcoded value like 1e-12, suggest to use someting like
1000 * std::numeric_limits<CubeElemType>::epsilon().

The user may elect to use matrices and cubes with single-precision floating point values, rather than double-precision. In other words, CubeElemType can be either float or double. For single-precision, the hardcoded 1e-12 is probably too low in this context.

conradsnicta · Answer 4 · Mon May 22 2023 14:28:40 GMT+0800 (China Standard Time)

PS. While we are at it, the following line should be replaced:
https://github.com/mlpack/ensmallen/blob/master/include/ensmallen_bits/lbfgs/lbfgs_impl.hpp#LL93C1-L93C57

 scalingFactor = 1.0 / sqrt(dot(gradient, gradient));

with

 scalingFactor = 1.0 / arma::norm(gradient, "fro");

This is for two reasons: (1) clarity of intent, (2) Armadillo will use a more robust algorithm to calculate the norm.

mlpack-bot · Answer 5 · Wed Jun 21 2023 14:44:42 GMT+0800 (China Standard Time)

This issue has been automatically marked as stale because it has not had any recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions! 👍

conradsnicta · Answer 6 · Fri Jun 23 2023 08:37:03 GMT+0800 (China Standard Time)

Should be resolved in #368. If not, please re-open and provide more details.