This is example 13.1 from the book.
Next step could be playing rock-paper-scissors against a simpler agent (like a k-armed bandit with a long learning window).
Will make example work, then play rock-paper-scissors against slow-learning k-armed bandit.
This is example 13.1 from the book.
Next step could be playing rock-paper-scissors against a simpler agent (like a k-armed bandit with a long learning window).
Will make example work, then play rock-paper-scissors against slow-learning k-armed bandit.