vfleaking / grokking-dichotomy

Code for "Dichotomy of Early and Late Phase Implicit Biases Can Provably Induce Grokking"

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Code for "Dichotomy of Early and Late Phase Implicit Biases Can Provably Induce Grokking"

Paper Link: https://arxiv.org/abs/2311.18817

Code:

  • train_mod_add.py: Train a two-layer ReLU net on modular addition
  • train_mod_add_nowd.py: Train a two-layer ReLU net on modular addition without weight decay. A special learning rate schedule is applied to speed up the training in the late phase.
  • train_diag_cls.py: Train a diagonal linear net on sparse linear classification.
  • train_diag_cls2.py: Train a diagonal linear net on linear classification, where the data has a very large L2 margin.
  • train_mc.py: Optimize for an overparameterized matrix completion problem.

About

Code for "Dichotomy of Early and Late Phase Implicit Biases Can Provably Induce Grokking"

License:Apache License 2.0


Languages

Language:Python 100.0%