Clarification is needed in the chapter "How should Adam’s hyperparameters be tuned?"

Question

Clarification is needed in the chapter "How should Adam’s hyperparameters be tuned?"

21kc-caracol opened this issue 5 months ago · comments

21kc-caracol commented 5 months ago

Please clarify if For a budget of 10-25:

First tune the learning rate, then beta1.
(Or) Create a search space for both parameters, then run experiments to find the best combination.
An example will be appreciated.

I understood that its option 1. First tune for best learning rate, then fix that value, and start tuning beta1.

Laura Sisson · Answer 1 · Thu Jun 27 2024 07:24:46 GMT+0800 (China Standard Time)

It’s the second option. If you only have a limited number of trials, you can focus on just tuning the learning rate, but if you have more compute/time, you can optimize beta1 and LR in conjunction.

If you change beta1 (or beta2 etc), you’ll need to retune LR.