gameduell / simcompyl

SimComPyl -- Run simulations - composable, compiled and pure python

use for reenforcment learning env sim

wabu opened this issue 5 years ago · comments

Daniel Davis commented 5 years ago

Simulating a einvoronment for reenfocment learning has some constraints, afaik:

efficient network interaction

After each simulation step, we have to provide new feedback to the network
and get new actions from it. This may require breaking out of numba mode, as
we have to interact with some other library.
- can we run simulation all numba and use shared mem as IO to ML thread
- does ML impl. provied a C function we can call with numba
- is it possible to exchange between simcompyl and ML just on gpu?
- check who we have to talk to and ways to interact? pytorch, tensorflow, ...
relative small population

I guess it is not ideal to have millions of parallel instances of the
env, each training it's own network or giving one network feedback from
hundreds of envs, but this may be possible. (For example you may have to swap
the state of LTSM units, but train on the same wights).

Furthermore, rewards could be slow/delayed, so for the agents to learn,
we require more steps inside the same env, not having many worlds with less
steps.

A small population may be problematic, as we may have to go out of numba
code after each step to call the network for the next action.
- do we really need small populations?
- how can a single network learn form multiple envs in parallel?
- can we train many networks and exchange knowledge (see parameter server?)
- can the network run in numba mode (numba implementation)