NelderMeadSolver gets stuck on a transect

Question

NelderMeadSolver gets stuck on a transect

selkovjr opened this issue 8 years ago · comments

The function I need to optimize is like a narrow hull with a well-defined global minimum. It is a 4-D surface, but its 2-D projections all look like a hull. In a previous experiment, everything worked well; the descent began with three parameters fixed at initial values (or almost fixed) and when it reached the minimum in the plane of initial descent, it made a few steps on either side of the plane and quickly turned toward the correct global minimum.

With slighly different data (can't say how different -- I forgot what data I used first, or what starting point), it is happy with the optimum it finds on the first approach and fails to look in other directions. I am pretty sure there is a healthy gradient remaining at that point. I know that because I have plotted the surface over a dense grid in that area and it seems to have a nice shape. What am I doing wrong?

Nelder-Mead is the only solver out of this bunch that converges on something. All others either fall into bizarre orbits or blow up after a few cycles.

Code and data at https://dl.dropboxusercontent.com/u/1725690/distort.tgz

Tobias Wood · Answer 1 · Wed Jul 06 2016 16:23:41 GMT+0800 (China Standard Time)

Have you checked the convergence criteria? Could it be a simple case of hitting the iterations limit?

See https://github.com/PatWie/CppNumericalSolvers/blob/master/src/examples/simple_withoptions.cpp for an example of how to set and check the criteria.

On 6 Jul 2016, at 09:10, Gene Selkov notifications@github.com wrote:

The function I need to optimize is like a narrow hull with a well-defined global minimum. It is a 4-D surface, but its 2-D projections all look like a hull. In a previous experiment, everything worked well; the descent began with three parameters fixed at initial values (or almost fixed) and when it reached the minimum in the plane of initial descent, it made a few steps on either side of the plane and quickly turned toward the correct global minimum.

With slighly different data (can't say how different -- I forgot what data I used first, or what starting point), it is happy with the optimum it finds on the first approach and fails to look in other directions. I am pretty sure there is a healthy gradient remaining at that point. I know that because I have plotted the surface over a dense grid in that area and it seems to have a nice shape. What am I doing wrong?

Nelder-Mead is the only solver out of this bunch that converges on something. All others either fall into bizarre orbits or blow up after a few cycles.

Code and data at https://dl.dropboxusercontent.com/u/1725690/distort.tgz https://dl.dropboxusercontent.com/u/1725690/distort.tgz
https://cloud.githubusercontent.com/assets/465640/16609322/1f4c1d80-4308-11e6-8ea1-c684e5f646b8.png
https://cloud.githubusercontent.com/assets/465640/16609457/3650bd0a-4309-11e6-9537-968729ce6159.png
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub #52, or mute the thread https://github.com/notifications/unsubscribe/AKie-J8tvA_bSAPYjS0mi9R9RWvegS4_ks5qS2LwgaJpZM4JF2I2.

Gene Selkov · Answer 2 · Wed Jul 06 2016 17:13:20 GMT+0800 (China Standard Time)

No, I'm far from reaching the iterations limit, but convergence criteria are interesting. I tried to hunt them down in the code and somehow missed the fact there is proper interface for adjusting them. Thanks for that -- I'll try that next.

It may well be that the surface is really weird and has real traps in it but I can't see them with this interpolation. Starting from the opposite side, I do hit the optimum in 195 iterations (while it takes less than 50 iterations to get trapped starting from b = 0.1):

Gene Selkov · Answer 3 · Thu Jul 07 2016 10:54:18 GMT+0800 (China Standard Time)

After some bumbling around, I think I can articulate my objection to this implementation. It works great except for the initial simplex placement. The problem is in this line:

x0(r, c) = (1 + 0.05) * x(r);

What it does is it stretches the simplex along the largest component of the initial vector. If my initial values are [1.0, 0.0001, 0.0001, 0.0001], the simplex is stretched so bad it can not sense the gradient along its minor dimensions, so reflection in those dimensions is next to useless, and it is strictly useless if the initial vector has zeroes in it (which is a cool way to reduce the problem to fewer dimensions without changing the code, and I thought for a moment it was intentional).

It is amazing that sometimes the squished initial simplex and its progeny are able to evolve a more reasonable shape and leave the flat world they were created in, as illustrated in the last diagram above. But most of the time, they remain happy in their flatness.

The solution that worked beautifully for me was to get rid of absolute argument values at initialization:

 x0(r, c) = x(r) + 0.005;

That creates another problem of forcing a fixed initial simplex size that should probably be domain-specific, but at least it starts with a nice compact simplex that does not depend on irrelevant information and tumbles in the right direction from step one. Maybe the general solution should involve a parameter to be set by the user, or some sort of automated sampling around the initial value to determine the reasonable initial simplex size.

Tobias Wood · Answer 4 · Fri Jul 08 2016 03:18:24 GMT+0800 (China Standard Time)

Thanks for investigating this.

Would a scale-invariant solution be to multiply the point by a fixed factor, instead of adding a fixed amount? i.e. something like:

x0(r, c) = x(r) * 1.005;

The factor can be made a property of the Solver and hence configurable. Do you have time to code this up and submit a pull request? The more contributors to this library the better!

On 7 Jul 2016, at 03:54, Gene Selkov notifications@github.com wrote:

After some bumbling around, I think I can articulate my objection to this implementation. It works great except for the initial simplex placement. The problem is in this line:

x0(r, c) = (1 + 0.05) * x(r);
What it does is it stretches the simplex along the largest component of the initial vector. If my initial values are [1.0, 0.0001, 0.0001, 0.0001], the simplex is stretched so bad it can not sense the gradient along its minor dimensions, so reflection in those dimensions is next to useless, and it is strictly useless if the initial vector has zeroes in it (which is a cool way to reduce the problem to fewer dimensions without changing the code, and I thought for a moment it was intentional).

It is amazing that sometimes the squished initial simplex and its progeny are able to evolve a more reasonable shape and leave the flat world they were created in, as illustrated in the last diagram above. But most of the time, they remain happy in their flatness.

The solution that worked beautifully for me was to get rid of absolute argument values at initialization:

x0(r, c) = x(r) + 0.005;
That creates another problem of forcing a fixed initial simplex size that should probably be domain-specific, but at least it starts with a nice compact simplex that does not depend on irrelevant information and tumbles in the right direction from step one. Maybe the general solution should involve a parameter to be set by the user, or some sort of automated sampling around the initial value to determine the reasonable initial simplex size.

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub #52 (comment), or mute the thread https://github.com/notifications/unsubscribe/AKie-KOH-LZX0QElwzoO1Da4FnTeGb5Qks5qTGpcgaJpZM4JF2I2.

Tobias Wood · Answer 5 · Fri Jul 08 2016 19:04:26 GMT+0800 (China Standard Time)

Sorry - my previous comment is dumb. That is how the code currently works.

The initial simplex generation doesn't stretch by the largest component of the initial guess. It stretches each component in turn, so I don't quite understand why your simplex ends up incorrect? Line 37 also checks for a zero component and uses a fixed displacement in this case.

What happens if you scale the parameters in your cost function such that your initial guess is just 1,1,1,1? i.e. divide them by 1e4?

Patrick Wieschollek · Answer 6 · Mon Jul 11 2016 21:08:01 GMT+0800 (China Standard Time)

This is exactly, what makes me unhappy. There is a lot of beautiful theory for optimization out there. But in the end one ends up with a bag of tricks to make these algorithms to work.

Another solution would be to randomly jiggle the initial values.

In your code, there is no symbolic gradient. I suggest to test the auto-diff branch of this project, because the inaccuracy of finite difference are maybe the main problem.

Gene Selkov · Answer 7 · Tue Aug 02 2016 12:04:25 GMT+0800 (China Standard Time)

Well, we've got to make do with what we've got, and if it means enlarging our bag of tricks, I'm fine with that. I heard this quote about Nelder-Mead attributed to Nelder: "Mathematicians hate it because you can't prove convergence; engineers seem to love it because it often works." (https://www.youtube.com/watch?v=r6HZMJGzlDc -- a great talk, if a bit lengthy, with many examples of non-convergence)

Sorry I was distracted. I had to find a job, and I wanted to finish the project where I use Nelder-Mead to make sure it gets enough thumping. I still have a couple weeks left before the new job starts, so I am ready to commit my changes and maybe deal with the fallout. The changes are in the way the initial simplex is placed and I made the process more verbose, so I could visualize the traces. I am forking it now.

Patrick Wieschollek · Answer 8 · Thu Jan 19 2017 17:18:16 GMT+0800 (China Standard Time)

@selkovjr any updates?

Patrick Wieschollek · Answer 9 · Fri Mar 03 2017 05:20:33 GMT+0800 (China Standard Time)

Close this, as it is an issue of the Nelder-Mead algorithm and not the implementation.