striatum
Reinforcement Learning test-bed for comparing multiple policies, environments and agents fully compatible with gym.openai.com
Basic usage
from striatum import TestBed
from striatum.policies import EpsilonGreedy
from striatum.environments import MultiArmedBandit
from striatum.analyses import AverageRewardPerStep, PercentageOptimalAction
test = TestBed({'policy': EpsilonGreedy(epsilon=0.1),
'env': MultiArmedBandit(n_arms=10)},
analyses=[AverageRewardPerStep(),
PercentageOptimalAction()])
test.run(n_steps=1_000, n_episodes=1_000).plot()
Emphasis on generative processes
Most experiments can be described with a generative process. We use dask custom graphs together with sklearn's double underscore notation to incorporate this into striatum. For example, consider the example shown above but with the added complexity of varying the number of arms between episodes.
def epsilon(n_arms):
return n_arms/100
test = TestBed({'policy': EpsilonGreedy(),
'env': MultiArmedBandit(),
'env__n_arms': (np.random.choice, [9, 10, 11]),
'policy__epsilon': (epsilon, 'env__n_arms')},
analyses=[AverageRewardPerStep(),
PercentageOptimalAction()])
test.run(n_steps=1_000, n_episodes=1_000).plot()
For each episode (in this case, n_episodes=1_000
times) the graph represented by the dictionary passed to TestBed
will be resolved like so:
>>> env__n_arms = np.random.choice([9, 10, 11])
>>> policy__epsilon = epsilon(env__n_arms)
>>> policy = EpsilonGreedy(epsilon=policy__epsilon)
>>> env = MultiArmedBandit(n_arms=env__n_arms)
Flexible analyses
def epsilon(n_arms):
return n_arms/100
test = TestBed({'policy': EpsilonGreedy(),
'env': MultiArmedBandit(),
'env__n_arms': (np.random.choice, [9, 10, 11]),
'policy__epsilon': (epsilon, 'env__n_arms')},
analyses=[AverageRewardPerStep(by='env__n_arms'),
PercentageOptimalAction(by='env__n_arms')])
test.run(n_steps=1_000, n_episodes=1_000).plot()
Etymology (why the name?)
Functionally, the striatum coordinates multiple aspects of cognition, including both motor and action planning, decision-making, motivation, reinforcement, and reward perception.
Source: Multiple, all can be found at https://en.wikipedia.org/wiki/Striatum