Clarification is needed in the chapter "How should Adam’s hyperparameters be tuned?"
21kc-caracol opened this issue · comments
21kc-caracol commented
Please clarify if For a budget of 10-25:
- First tune the learning rate, then beta1.
- (Or) Create a search space for both parameters, then run experiments to find the best combination.
An example will be appreciated.
I understood that its option 1. First tune for best learning rate, then fix that value, and start tuning beta1.
Laura Sisson commented
It’s the second option. If you only have a limited number of trials, you can focus on just tuning the learning rate, but if you have more compute/time, you can optimize beta1 and LR in conjunction.
If you change beta1 (or beta2 etc), you’ll need to retune LR.