CUNY-CL / yoyodyne

Small-vocabulary sequence-to-sequence generation with optional feature conditioning

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Updates and Documentation for WandB sweeps

Adamits opened this issue · comments

commented

Based on other work were doing, we should add some documentation and make necessary tweaks for running a W&B sweep with this codebase.

  • Add documentation and examples of running WandB sweeps with Yoyodyne.
  • Make updates to codebase so PTL and WandB play nice wrt logging hyperparameters, etc.
  • Update PTL to log max validation accuracy.

Let me also add that the documentat should probably show how to retrieve best runs from the wandb API too.

commented

I guess relatedly it would also be nice to have a system for easily pointing W&B run id's to yoyodyne logging, etc.

commented

Working on this now. Was wondering if you think we should add train args so that it can be called in such a way that a wandb agent trains from a sweep (by adding a wandb_sweep_id and max_num_runs arg), or if this should be a seperate scripts that we maintain in the library (something like train_wandb_agent.py).

Other notes:

  • I was not able to find anything on how to log the max validation accuracy in PTL, and let it propogate that logging to wandb, so instead I just do wandb.define_metric('val_accuracy', summary='max') when wandb logging is enabled.
  • PTL tries to log the model hparams to the wandb run when the WandbLogger is enabled, causing a warning, because they also get logged when we start the sweep agent. See here: wandb/wandb#2641. I do not know how to fix this, since it does not seem to be PTL behavior I can toggle, and we need the PTL WandbLogger in order to also log runtime metrics. I think we can just let it happen it for now?

Working on this now. Was wondering if you think we should add train args so that it can be called in such a way that a wandb agent trains from a sweep (by adding a wandb_sweep_id and max_num_runs arg), or if this should be a seperate scripts that we maintain in the library (something like train_wandb_agent.py).

While I'm not sure I have enough context to get this yet, I think I am fine just including docs and a sample script for doing wandb stuff. It's hard for me to imagine doing this effectively using yoyodyne-train alone, I guess? I assume you did your sweeping using custom Python, right?

  • I was not able to find anything on how to log the max validation accuracy in PTL, and let it propogate that logging to wandb, so instead I just do wandb.define_metric('val_accuracy', summary='max') when wandb logging is enabled.

SGTM.

Let's just suppress the warning in __init__.py then, and add a TODO to investigate this at the PTL level later.

commented

Working on this now. Was wondering if you think we should add train args so that it can be called in such a way that a wandb agent trains from a sweep (by adding a wandb_sweep_id and max_num_runs arg), or if this should be a separate script that we maintain in the library (something like train_wandb_agent.py).

While I'm not sure I have enough context to get this yet, I think I am fine just including docs and a sample script for doing wandb stuff. It's hard for me to imagine doing this effectively using yoyodyne-train alone, I guess? I assume you did your sweeping using custom Python, right?

Yeah I just have a train_wandb_agent.py script that calls the functions in train.py. So do we need a directory at the top-level of our repository called examples or similar? Or do you think its better to have train_wandb_agent.py live alongside train.py?

Let's just suppress the warning in __init__.py then, and add a TODO to investigate this at the PTL level later.

Sounds good!

Working on this now. Was wondering if you think we should add train args so that it can be called in such a way that a wandb agent trains from a sweep (by adding a wandb_sweep_id and max_num_runs arg), or if this should be a separate script that we maintain in the library (something like train_wandb_agent.py).

While I'm not sure I have enough context to get this yet, I think I am fine just including docs and a sample script for doing wandb stuff. It's hard for me to imagine doing this effectively using yoyodyne-train alone, I guess? I assume you did your sweeping using custom Python, right?

Yeah I just have a train_wandb_agent.py script that calls the functions in train.py. So do we need a directory at the top-level of our repository called examples or similar? Or do you think its better to have train_wandb_agent.py live alongside train.py?

Yes that's what I'd suggest. I'd have one for running the sweep and, optionally, one for grabbing the results from W&B.

I don't know if we need to modify the project file to register the existence of that directory, but prevent it from being installed as part of the package...something to look out for: browse the verbose installation info and you should see what happens there.

commented

@kylebgorman Should we leave this open until we've played with the examples and are sure the scripts are sufficient, and documentation is good enough?

Okay, sure. I'd like to take it for a spin first.

commented

Sorry, I just meant this issue -- not the PR!

Sorry, I just meant this issue -- not the PR!

Got it, yea I was confused at first.