JasonGross / guarantees-based-mechanistic-interpretability

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Switch argparse to config.py / fiddle / argparse-dataclass

euanong opened this issue · comments

Dump of my thoughts:

  1. I've heard someone suggest Hydra + Fire
  2. argparse-dataclass looks like it's missing support for nested data classes, right? Maybe there's a way to kludge nested dataclass support?
  3. I was not able to make heads or tails of fiddle from looking at it for two minutes, I'll take a deeper look later
  4. click looks cool, but it seems mostly geared around functions not dataclasses?

Looking a bit more at fiddle, I guess the essential design question here is which way we want the arrows to point. Right now model configuration and running feels a bit spaghetti, I think, because the arrows don't all point the same way:
The top-level model drivers all invoke train_or_load_model with something that is subclassed from a config object in the train_or_load_model file.

I think right now train_or_load_model is doing too many things:

  1. it is constructing wandb information from model config (this should be factored into a separate function)
  2. it is constructing disk path information from model config (this should also be factored)
  3. it tries loading the model from disk or else wandb
  4. it constructs training arguments from model config & wandb info (this should also be factored)
  5. it runs the training loop
  6. it saves the model to disk & wandb

I think the relevant design constraints are:

  1. it's nice for consumers of experiments to be able to import something from wherever the config is saved and make a single call that fetches the model, training it if it doesn't exist
  2. experiments are varied, and should have control over how to configure model setup and training
  3. (HookedTransformer) model architecture is currently uniform and should be de-duplicated across all experiments that involve a single HookedTransformer model
  4. logging is uniform; experimental setup should not have to think about wandb, disk, etc
  5. the model configuration should be serializable (for logging) and reproducible
  6. we should be able to define various configurations of an experiment we care about either in python (or yaml, I guess) or from the command line

I am thinking that plausibly we want to invert the control flow: rather than having a unified config object class across all experiements, we want to define wrappers of useful common functionality, and merge configs for various functions with fiddle?