Might be a bug in ethyr/stim.py

Question

Might be a bug in ethyr/stim.py

dimonenka opened this issue 5 years ago · comments

As i gathered, stim.py is about collecting features for an agent to make a decision in a particular state. My question is about tile function. This is called in environment function for each tile, visible for ent. If this tile is not empty, i.e. has entities on it, stats of this entity are appended to a list. Two issues with how it is done:

This might be a bug. Right now these stats of entities are collected with statStim = stats(ent, e, config). However, ent is the agent that makes the decision, and e is either itself or its neighbour. So, if we look at stats, right now this statStim = stats(ent, e, config) collects the same stats of ent over and over again and only the distance from ent to e is collected correctly. Shouldn't this be statStim = stats(e, ent, config) instead, so stats of e are collected? Like, hp, food, and water of neighbours, instead of the agent itself repeatedly?
This is more of a suggestion. Currently, information about ent is always included in the list of entities. May be it's better to check if e.entID != ent.entID and not add ent to the list, unless as a placeholder when there are no neighbours?

Here's the code with two modifications included (I did not run it, but it at least illustrates the concept)

def environment(env, ent, sz, config):
R, C = env.shape
conv, ents = np.zeros((2, R, C)), []
for r in range(R):
for c in range(C):
t, e = tile(ent, env[r, c], sz, config)
conv[:, r, c] = t
ents += e
if len(ents) == 0: # this is also new
ents.append(stats(ent, ent, config))
assert len(ents) > 0
ents = np.array(ents)
return conv, ents

def tile(ent, t, sz, config):
nTiles = 8
index = t.state.index
assert 0 <= index < nTiles
conv = [index, t.nEnts]

ents = []
r, c = ent.pos
for e in t.ents.values():

    # my modification instead (continued in environment):
    if ent.entID != e.entID:
        statStim = stats(e, ent, config)  # this is the most important suggestion
        e.stim = statStim
        ents.append(statStim)

return conv, ents

Joseph Suarez · Answer 1 · Sat Jun 01 2019 14:36:47 GMT+0800 (China Standard Time)

Wow, glad to see people diving so deep into the api! For many of the attributes, it does not matter one way or the other which argument is included first, though there was probably some significant loss of information available to the agent. The good news is that bugs like these only ever work in our favor (reminds me of DotA: "We’re still fixing bugs. The chart shows a training run of the code that defeated amateur players, compared to a version where we simply fixed a number of bugs, such as rare crashes during training, or a bug which resulted in a large negative reward for reaching level 25. It turns out it’s possible to beat good humans while still hiding serious bugs!" https://openai.com/blog/openai-five/)

I actually noticed this myself fairly recently. I didn't bother doing comprehensive testing to determine if this was indeed a bug (would need to rerun training for several days) because -- good news! -- I'm getting close to done with the first major patch for the environment, which is a complete infrastructure and input/output rework. You can follow this on the cowboy-dev branch if you like, but I don't suggest forking it just yet, as it's not fully stable.

Thanks for your interest -- TL;DR this will be fixed in 1.1, expected some time between 2 weeks and a month. Feel free to fiddle with it on your local copy to see if it makes a difference in training :)