Spiky Cost Trajectories

Question

Spiky Cost Trajectories

SamuelSchlesinger opened this issue a year ago · comments

In certain training scenarios, I see extremely spiky cost trajectories through training. I bet this could be solved (at least partially) by implementing adagrad or some other adaptive learning rate scheme where the learning rate is adapted per parameter or even adapted at all. I've got an executable in my branch for generating the entire Boolean table of a random function on n bits and you can easily see this behavior with random functions. Here's an example with a 12 bit function

Samuel Schlesinger · Answer 1 · Fri May 19 2023 13:15:19 GMT+0800 (China Standard Time)

Actually, I implemented something simpler that seems to help a bit. Basically, I dynamically change the rate up or down depending on how much the cost is fluctuating. It's very primitive, but it does the trick. Basically speeds up the rate at first then slows it down at the end.

Samuel Schlesinger · Answer 2 · Fri May 19 2023 15:40:13 GMT+0800 (China Standard Time)

For the best results, I ended up implementing a running exponential smoothing of the cost change and using that as a regulatory factor for the rate. Otherwise, the spikes can cause a significant lapse in training. Another good idea is to store the very best version of the network found yet and give the ability to revert back to that.