Add Kate

Question

Add Kate

tfriedel opened this issue 3 months ago · comments

note that you can't lift the code as-is. Parameters are passed over in a "cfg" variable and also there's an issue with it not checking if gradients are none and how the "device" value is read. Maybe more issues than that.

Hyeongchan Kim · Answer 1 · Sat Jul 06 2024 14:06:26 GMT+0800 (China Standard Time)

@tfriedel thanks for the request! As you said, the official code was incorrectly implemented and had several issues, unlike the paper. I corrected the implementation and checked the optimizer working.

Thomas Friedel · Answer 2 · Sat Jul 06 2024 18:31:25 GMT+0800 (China Standard Time)

@kozistr
Cool! Did you try it and get decent results? I had fixed the obvious issues I mentioned and tried it out, but results were poor. However it would need more testing to check if it's an implementation issue or not. Haven't checked your implementation yet.

Hyeongchan Kim · Answer 3 · Sat Jul 06 2024 19:03:36 GMT+0800 (China Standard Time)

@kozistr Cool! Did you try it and get decent results? I had fixed the obvious issues I mentioned and tried it out, but results were poor. However it would need more testing to check if it's an implementation issue or not. Haven't checked your implementation yet.

actually, I didn't test it on the other benchmark datasets (just tested with the toy example in the test cases).

it seems there are differences between the original code and the pseudocode in the paper. So, I re-implemented it based on the paper though.

Hyeongchan Kim · Answer 4 · Sun Jul 07 2024 16:18:55 GMT+0800 (China Standard Time)

@kozistr Cool! Did you try it and get decent results? I had fixed the obvious issues I mentioned and tried it out, but results were poor. However it would need more testing to check if it's an implementation issue or not. Haven't checked your implementation yet.

just added the visualizations here.

ran the visualization on Rosenbrock and Rastrigin functions. here's the result of Kate optimizer, and the result seems fine (not diverged, maybe working).

if I'm available, will add more tests on the benchmark datasets like imagenet, mnist too.

Thomas Friedel · Answer 5 · Sun Jul 07 2024 23:06:20 GMT+0800 (China Standard Time)

Interesting! I didn't know about Rastigrin.
Having some more relalistic benchmarks for all these optimiziers would be nice. Do you know about
https://github.com/mlcommons/algorithmic-efficiency
?
I know prodigy and schedule-free were submitted for this benchmark.

Also you may be interested in this twitter thread about optimizers:
https://x.com/_clashluke/status/1808590060654108910

For example there was a mention of a grafted Lion#Adam optimizer:
https://x.com/dvruette/status/1627663196839370755

Hyeongchan Kim · Answer 6 · Mon Jul 08 2024 18:44:20 GMT+0800 (China Standard Time)

Interesting! I didn't know about Rastigrin. Having some more relalistic benchmarks for all these optimiziers would be nice. Do you know about https://github.com/mlcommons/algorithmic-efficiency ? I know prodigy and schedule-free were submitted for this benchmark.

Also you may be interested in this twitter thread about optimizers: https://x.com/_clashluke/status/1808590060654108910

For example there was a mention of a grafted Lion#Adam optimizer: https://x.com/dvruette/status/1627663196839370755

thanks for the resources! I'll try some