Switch argparse to config.py / fiddle / argparse-dataclass
euanong opened this issue · comments
Dump of my thoughts:
- I've heard someone suggest Hydra + Fire
- argparse-dataclass looks like it's missing support for nested data classes, right? Maybe there's a way to kludge nested dataclass support?
- I was not able to make heads or tails of fiddle from looking at it for two minutes, I'll take a deeper look later
- click looks cool, but it seems mostly geared around functions not dataclasses?
Looking a bit more at fiddle, I guess the essential design question here is which way we want the arrows to point. Right now model configuration and running feels a bit spaghetti, I think, because the arrows don't all point the same way:
The top-level model drivers all invoke train_or_load_model
with something that is subclassed from a config object in the train_or_load_model
file.
I think right now train_or_load_model
is doing too many things:
- it is constructing wandb information from model config (this should be factored into a separate function)
- it is constructing disk path information from model config (this should also be factored)
- it tries loading the model from disk or else wandb
- it constructs training arguments from model config & wandb info (this should also be factored)
- it runs the training loop
- it saves the model to disk & wandb
I think the relevant design constraints are:
- it's nice for consumers of experiments to be able to import something from wherever the config is saved and make a single call that fetches the model, training it if it doesn't exist
- experiments are varied, and should have control over how to configure model setup and training
- (HookedTransformer) model architecture is currently uniform and should be de-duplicated across all experiments that involve a single HookedTransformer model
- logging is uniform; experimental setup should not have to think about wandb, disk, etc
- the model configuration should be serializable (for logging) and reproducible
- we should be able to define various configurations of an experiment we care about either in python (or yaml, I guess) or from the command line
I am thinking that plausibly we want to invert the control flow: rather than having a unified config object class across all experiements, we want to define wrappers of useful common functionality, and merge configs for various functions with fiddle?