Switch argparse to config.py / fiddle / argparse-dataclass

Question

Switch argparse to config.py / fiddle / argparse-dataclass

euanong opened this issue 6 months ago · comments

Euan Ong commented 6 months ago

Jason Gross · Answer 1 · Mon Jan 22 2024 01:44:40 GMT+0800 (China Standard Time)

Dump of my thoughts:

I've heard someone suggest Hydra + Fire
argparse-dataclass looks like it's missing support for nested data classes, right? Maybe there's a way to kludge nested dataclass support?
I was not able to make heads or tails of fiddle from looking at it for two minutes, I'll take a deeper look later
click looks cool, but it seems mostly geared around functions not dataclasses?

Jason Gross · Answer 2 · Mon Jan 22 2024 05:21:57 GMT+0800 (China Standard Time)

Looking a bit more at fiddle, I guess the essential design question here is which way we want the arrows to point. Right now model configuration and running feels a bit spaghetti, I think, because the arrows don't all point the same way:
The top-level model drivers all invoke train_or_load_model with something that is subclassed from a config object in the train_or_load_model file.

I think right now train_or_load_model is doing too many things:

it is constructing wandb information from model config (this should be factored into a separate function)
it is constructing disk path information from model config (this should also be factored)
it tries loading the model from disk or else wandb
it constructs training arguments from model config & wandb info (this should also be factored)
it runs the training loop
it saves the model to disk & wandb

Jason Gross · Answer 3 · Mon Jan 22 2024 06:00:38 GMT+0800 (China Standard Time)

I think the relevant design constraints are:

it's nice for consumers of experiments to be able to import something from wherever the config is saved and make a single call that fetches the model, training it if it doesn't exist
experiments are varied, and should have control over how to configure model setup and training
(HookedTransformer) model architecture is currently uniform and should be de-duplicated across all experiments that involve a single HookedTransformer model
logging is uniform; experimental setup should not have to think about wandb, disk, etc
the model configuration should be serializable (for logging) and reproducible
we should be able to define various configurations of an experiment we care about either in python (or yaml, I guess) or from the command line

Jason Gross · Answer 4 · Sat Jan 27 2024 04:13:25 GMT+0800 (China Standard Time)

I am thinking that plausibly we want to invert the control flow: rather than having a unified config object class across all experiements, we want to define wrappers of useful common functionality, and merge configs for various functions with fiddle?