gameduell / simcompyl

SimComPyl -- Run simulations - composable, compiled and pure python

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

use for reenforcment learning env sim

wabu opened this issue · comments

Simulating a einvoronment for reenfocment learning has some constraints, afaik:

  • efficient network interaction

    After each simulation step, we have to provide new feedback to the network
    and get new actions from it. This may require breaking out of numba mode, as
    we have to interact with some other library.

    • can we run simulation all numba and use shared mem as IO to ML thread
    • does ML impl. provied a C function we can call with numba
    • is it possible to exchange between simcompyl and ML just on gpu?
    • check who we have to talk to and ways to interact? pytorch, tensorflow, ...
  • relative small population

    I guess it is not ideal to have millions of parallel instances of the
    env, each training it's own network or giving one network feedback from
    hundreds of envs, but this may be possible. (For example you may have to swap
    the state of LTSM units, but train on the same wights).

    Furthermore, rewards could be slow/delayed, so for the agents to learn,
    we require more steps inside the same env, not having many worlds with less
    steps.

    A small population may be problematic, as we may have to go out of numba
    code after each step to call the network for the next action.

    • do we really need small populations?
    • how can a single network learn form multiple envs in parallel?
    • can we train many networks and exchange knowledge (see parameter server?)
    • can the network run in numba mode (numba implementation)