atbolsh / Policy-Gradient

Will make example work, then play rock-paper-scissors against slow-learning k-armed bandit.

This is example 13.1 from the book.

Next step could be playing rock-paper-scissors against a simpler agent (like a k-armed bandit with a long learning window).

About

Will make example work, then play rock-paper-scissors against slow-learning k-armed bandit.

Language:Python 100.0%