PWhiddy / PokemonRedExperiments

Love this project!

Mind adding a section in the README about what algorithms used to get a quick overview of how this works?
I'm gonna look at the code but would be very convensient to get some info about this.

Hi!
Yes that would be great to have in the readme. Hoping to add this as well as some standard baseline results tables in the not so far future.
For now, short answer is that it uses PPO with fairly normal hyperparameters.
You can see these here:

PokemonRedExperiments/baselines/run_baseline_parallel_fast.py

Line 62 in 624e6f0

    
           model = PPO('CnnPolicy', env, verbose=1, n_steps=ep_length // 8, batch_size=128, n_epochs=3, gamma=0.998)

Rewards are a bit more complicated, video describes some of this but you'll have to look at the code for more details for now.

Algorithms used?