Updates and Documentation for WandB sweeps
Adamits opened this issue · comments
Based on other work were doing, we should add some documentation and make necessary tweaks for running a W&B sweep with this codebase.
- Add documentation and examples of running WandB sweeps with Yoyodyne.
- Make updates to codebase so PTL and WandB play nice wrt logging hyperparameters, etc.
- Update PTL to log max validation accuracy.
Let me also add that the documentat should probably show how to retrieve best runs from the wandb API too.
I guess relatedly it would also be nice to have a system for easily pointing W&B run id's to yoyodyne logging, etc.
Working on this now. Was wondering if you think we should add train
args so that it can be called in such a way that a wandb agent trains from a sweep (by adding a wandb_sweep_id
and max_num_runs
arg), or if this should be a seperate scripts that we maintain in the library (something like train_wandb_agent.py
).
Other notes:
- I was not able to find anything on how to log the max validation accuracy in PTL, and let it propogate that logging to wandb, so instead I just do
wandb.define_metric('val_accuracy', summary='max')
when wandb logging is enabled. - PTL tries to log the model hparams to the wandb run when the WandbLogger is enabled, causing a warning, because they also get logged when we start the sweep agent. See here: wandb/wandb#2641. I do not know how to fix this, since it does not seem to be PTL behavior I can toggle, and we need the PTL WandbLogger in order to also log runtime metrics. I think we can just let it happen it for now?
Working on this now. Was wondering if you think we should add
train
args so that it can be called in such a way that a wandb agent trains from a sweep (by adding awandb_sweep_id
andmax_num_runs
arg), or if this should be a seperate scripts that we maintain in the library (something liketrain_wandb_agent.py
).
While I'm not sure I have enough context to get this yet, I think I am fine just including docs and a sample script for doing wandb stuff. It's hard for me to imagine doing this effectively using yoyodyne-train
alone, I guess? I assume you did your sweeping using custom Python, right?
- I was not able to find anything on how to log the max validation accuracy in PTL, and let it propogate that logging to wandb, so instead I just do
wandb.define_metric('val_accuracy', summary='max')
when wandb logging is enabled.
SGTM.
- PTL tries to log the model hparams to the wandb run when the WandbLogger is enabled, causing a warning, because they also get logged when we start the sweep agent. See here: [CLI] wandb: WARNING Config item 'hyperparam_name' was locked by 'sweep' (ignored update) wandb/wandb#2641. I do not know how to fix this, since it does not seem to be PTL behavior I can toggle, and we need the PTL WandbLogger in order to also log runtime metrics. I think we can just let it happen it for now?
Let's just suppress the warning in __init__.py
then, and add a TODO to investigate this at the PTL level later.
Working on this now. Was wondering if you think we should add
train
args so that it can be called in such a way that a wandb agent trains from a sweep (by adding awandb_sweep_id
andmax_num_runs
arg), or if this should be a separate script that we maintain in the library (something liketrain_wandb_agent.py
).While I'm not sure I have enough context to get this yet, I think I am fine just including docs and a sample script for doing wandb stuff. It's hard for me to imagine doing this effectively using
yoyodyne-train
alone, I guess? I assume you did your sweeping using custom Python, right?
Yeah I just have a train_wandb_agent.py
script that calls the functions in train.py
. So do we need a directory at the top-level of our repository called examples
or similar? Or do you think its better to have train_wandb_agent.py
live alongside train.py
?
Let's just suppress the warning in
__init__.py
then, and add a TODO to investigate this at the PTL level later.
Sounds good!
Working on this now. Was wondering if you think we should add
train
args so that it can be called in such a way that a wandb agent trains from a sweep (by adding awandb_sweep_id
andmax_num_runs
arg), or if this should be a separate script that we maintain in the library (something liketrain_wandb_agent.py
).While I'm not sure I have enough context to get this yet, I think I am fine just including docs and a sample script for doing wandb stuff. It's hard for me to imagine doing this effectively using
yoyodyne-train
alone, I guess? I assume you did your sweeping using custom Python, right?Yeah I just have a
train_wandb_agent.py
script that calls the functions intrain.py
. So do we need a directory at the top-level of our repository calledexamples
or similar? Or do you think its better to havetrain_wandb_agent.py
live alongsidetrain.py
?
Yes that's what I'd suggest. I'd have one for running the sweep and, optionally, one for grabbing the results from W&B.
I don't know if we need to modify the project file to register the existence of that directory, but prevent it from being installed as part of the package...something to look out for: browse the verbose installation info and you should see what happens there.
@kylebgorman Should we leave this open until we've played with the examples and are sure the scripts are sufficient, and documentation is good enough?
Okay, sure. I'd like to take it for a spin first.
Sorry, I just meant this issue -- not the PR!
Sorry, I just meant this issue -- not the PR!
Got it, yea I was confused at first.