NivenT / REnforce

Reinforcement learning library written in Rust

Home Page:https://nivent.github.io/REnforce/renforce/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

The bandit tests are flaky

NivenT opened this issue · comments

As a bare minimum for thinking a new RL algorithm was possible implemented correctly, it is given a test on the N-armed bandit problem. This environment is about as simple as RL environments get, and so every algorithm should be able to "solve" it w/o problem. This is currently not the case, as some environments (I think just CrossEntropy) do not consistently pass. More care needs to be taken in choosing hyperparameters here so tests aren't flaky.