gym-stochastic

Reinforcement learning gyms for experimenting with stochasticity.

DistributionContextualBanditEnv-v0

initially based on the wonderful https://github.com/JKCooper2/gym-bandits (under MIT license )
generalized further to allow more features, changed enough so that I felt a fork wasn't ideal

Features

arbitrary distributions for reward amount and payoff probability
mix and match constant, per-arm fixed, gaussian, and uniform distributions
compose distributions in various ways, including summing, multiplying, and weighted select
use arbitrary functions for computing probabilties
context is n-dimensional unit vector (optional)

Example

import gym_stochastic
from gym_stochastic.envs.dist_utils import *

arms_r_comp = get_sampler__composite_perarm( 
    sub_samplers=[
        get_sampler__composite_select([
            get_reward_sampler__fixed_norm_arm(5.0,1.0),
            get_reward_sampler__fixed_norm_arm(-20.0,5.0),]
        ),
        get_sampler__composite_select([
            get_reward_sampler__fixed_norm_arm(-10.0,1.0),
            get_reward_sampler__fixed_uniform_arm(5.0,25.0),],
            dist=[0.1, 0.9,] )] )

env=gym.make('DistributionContextualBanditEnv-v0',arms=2, p_dist_fn=1.0, r_dist_fn=arms_r_comp ),

Above config results in arm-reward histograms:

WetChicken1d-v0

1 dimensional Wet Chicken as described in section 4.1 of:

Alexander Hans and Steffen Udluft. Efficient uncertainty propagation for reinforcement learning with limited data. In ICANN, pp. 70–79. Springer, 2009. https://www.tu-ilmenau.de/fileadmin/media/neurob/publications/conferences_int/2009/Hans-ICANN-2009.pdf

I was unable to locate a copy of the original reference:

V. Tresp. The wet game of chicken. Siemens AG, CT IC 4, Technical Report, 1994.

pathway / gym-stochastic

gym-stochastic

DistributionContextualBanditEnv-v0

WetChicken1d-v0

About

Languages