srush / annotated-s4

Implementation of https://srush.github.io/annotated-s4

Home Page:https://srush.github.io/annotated-s4

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Reconcile S4 Optimizer w/ Original Implementation

siddk opened this issue · comments

The original S4 repository indicates that optimization for S4 needs to be handled specially (https://github.com/HazyResearch/state-spaces/blob/feeab742e9c737c8e2b8b0e44d3efff4049f5847/example.py#L235).

Specifically:

  • Fixed small learning rates for state space matrices, with no weight decay (we do not respect this with current AdamW).
  • Larger learning rates & weight decay for other parameters.