To run a training experiments with 50 avaliable levels for 20M steps and log every 10 steps, use random crop augmentation on top of the baseline PPO algorithm, and saved the model with idx 1:
To test a trained random-random-cut agent at index 3 on a set of {100, 1000, 2000, ..., 95000} level intervals:
(it's easier to put these blocks of commands in a bash .sh file and run from command line)