seuRTS

microRTS project from SEU

Run sock/ServerAI.py
Run microrts/src/tests/sockets/RunClientExample.java

You can also:

Run microrts/src/tests/sockets/RunServerExample.java
Run microrts/src/tests/sockets/RunClientExample.java

to see the illustrative running example of this server.

What you should go through

Main file to look at is sock/Sever.py

Focus on SeverAI.py and don't pay attention to SocketAI.py, nor hardCodedJSON.py

Methods you should look at are:

  BabyAI.getAction(player, gs)
  policy(player, gs)

Currently, policy always returns "Do nothing until be killed."

For the details "parameter" and "type" in the return dict of policy, please check hardCodedJSON.py

What you can modify or contribute

Modifications you can do

We almost have everything for RL problem, so that once you can format out the environment and agents from ''gs'', you should be able to interact with the clients example.

Modify the policy function to change the responses of BabyAI.

To be continued...

Contributions you can do

ResourceUsage

The most urgent module needs to be implemented. Since there is no guarantee for integrity of internal policy, we're unable to ensure all the actions generated by the policy are legal.
preGameAnalysis

This part in Server.py has been barely done, please do contribution to this part if you can.
hardCodedJSON

It has not been modified to support any else mode but fully-observed & deterministic mode.
...

Problems

1. How to define reward?

In RL problem, we always see the transition pairs like (S_t, a, r, S_t+1)
Here States S_t and S_t+1 can be derived from gs; Action a is given by policy method, but Reward r is unknown.

2. Supervised RL or just Direct RL?

Direct RL seems to be infeasible due to the large search space.
Supervised RL requires tons of excellent training data to learn a good policy.
- Learn to find the shortest path to attack enemy even might be challenging in this problem, if the reward is not relating to the distance between the starting point and target point or time that the agent get to specific destinations.

3. Self-play?

Most of state-of-art RL agents apply self-play in late training phase. Self-play is also arduous in this project.

jsyx1994 / seuRTS