lil-lab / cerealbar

Hi, I have some problems and hope your answer.
1.how to understand gold? Such as gold distributions、gold trajectory……etc. Does it mean oracle？And how did you get these data？
2.When I run finetune_end_to_end.sh(end_to_end=True)， I got raise NotImplementedError in agent/model/models/action_generator_model.py line727, the snippet is

if trajectory_distribution is None:
                if self._end_to_end:
                    raise NotImplementedError

I believe that end_to_end training mode means that action_generator predicts action sequence without gold distributions, it must predict distributions by plan_predictor. But there is not code to imply this process besides NotImplementedError. I can't make sense……

Hope your reply，thx

Gold is like gold standard -- I'm not sure if "oracle" is correct here since we don't have some automated process which dynamically constructs them. The gold distributions are coming from the original human-human data: we extract the gold trajectory locations and gold cards to avoid/select from what the human follower did. The gold obstacles are just coming from the environment.

For example, the action generator is pre-trained on gold input distributions. The gold input distributions are created in this block of code:

cerealbar/agent/learning/batch_util.py

Line 82 in cabb6ea

trajectory_distribution: torch.Tensor = torch.from_numpy(

.

For the NotImplementedError: end-to-end means the action generator should be predicting sequences with predicted input distributions (from a plan predictor). There actually is some code missing here, so I just pushed an update which includes the correct code for it (from the private development repo).

Hi，bugs still exist after your update, I haven't seen these like self._hex_predictor and card_distribution, it seems that self._hex_predictor is self._plan_predictor and card_distribution is card_mask but I am not sure. Could you push next update which the code is correct(if you can). Thx

Hi, I pushed another update. Hopefully this will fix the problem.

Hi，Thanks for your update. By the way，I found that partial_observation is in development，I got terrible performance in my own development. Could you push update of your code about partial_observation ？

Hi, I pushed another update. Hopefully this will fix the problem.

Hi, I modify code according to your update but end_to_end funetuning also failed.
the shape of auxiliary_predictions[auxiliary.Auxiliary.TRAJECTORY] is [1,25,25], but batch_size, time, width, height = map_distribution.size() need 4-dims, I think you may forget to update some code in get_predictions() of plan_prediction_model.py or normalize_trajectory_distribution() in action_generator_model.py. Could you check it?

partial_observation is actually not part of the main project, and since the default setting is using full observability, partial observability will result in slightly worse performance than the partial observation.

What do you mean by getting worse performance?

I updated it again, but I can't test the code easily right now. Could you let me know exactly what you have been running (i.e., the config files you are using, and the pretrained model, or something else)? Then I can more easily debug.

@ChrisRanger , I'm going to try to debug this later today. It's possible that I missed something when cleaning up the code in this repo (this repo is supposed to be the cleaned-up version of the original one, but I think I missed a few ones), so worst case I will just copy over the original code from another repo into this one.

If you can describe what you are trying to do with the repo (replicate experiments exactly, apply to another dataset, etc.) then that will be helpful for me to know. Thanks!

I am running finetune_end_to_end.sh, config params is
--saved_game_dir="data/" \ --game_state_filename="agent/preprocessed/game_states.pkl" \ --save_dir="agent/experiments/" \ --tb_writer_name=${EXPERIMENT_NAME} \ --model_type=ACTION_GENERATOR \ --load_pretrained=True \ --end_to_end=True \ --aggregate_examples=False \ --maximum_generation_length=25 \ --generate_new_cards=True \ --full_observability=True \ --finetune_auxiliary_coefficient_intermediate_goal_probabilities=1. \ --finetune_auxiliary_coefficient_final_goal_probabilities=1. \ --finetune_auxiliary_coefficient_obstacle_probabilities=0.1 \ --finetune_auxiliary_coefficient_avoid_probabilities=0.1 \ --finetune_auxiliary_coefficient_trajectory_distribution=0.04 \ --use_trajectory_distribution=True \ --use_goal_probabilities=True \ --use_obstacle_probabilities=True \ --use_avoid_probabilities=True \ --evaluation_results_filename="training_inference.log" \ --pretrained_plan_predictor_filepath=${PLAN_PREDICTOR_FILEPATH} \ --pretrained_action_generator_filepath=${ACTION_GENERATOR_FILEPATH}
Hi, I am replicating experiments and try to propose my algorithms based on cerealbar game(i.e.,other learning paradigms
like IRl、RL or GAIL，leader policy like instruction generate、partial observation……etc).Last week I implemented partial observation by myself， but in my partial observation training, I got much worse performance than full observation. I saw many #TODO in repo, so could you share your partial observation code with me？If you can't open source, you can send the code to my email(wangshiyu@nuaa.edu.cn). Thanks!

Thanks for the clarification!

Just to check -- you pretrained both model components, right?

To clarify, the partial observability code is not available in the other repo either (our original paper only used full observability). It was only partially implemented in this repo; sorry for the confusion. It was a work in progress when I publicized this code, and at this point, in my personal experiments, I'm using a completely different model and architecture for partial observability. In other words, it would take me significant time to fully support partial observability in this repo or the old (private) repo.

I don't plan to officially support partial observability until my current project is public (will be in the next few months, hopefully). However, I am happy to talk through with you the general design of how I am doing partial observability now.

Understood! In other words, you designed a new model architecture for your partial observation. Could you share with me your idea about this?
By the way, aiming at this language-conditional agent algorithm, I want to ask you some questions. Thanks.
1.Is the dataset necessary? If the dataset is difficult to get, does RL work?(especially large vocabulary) I am trying to verify this……；
2.How to ensemble vanilla state(such as game image) and natural language？In your paper，RNN processed language and learned parameters of CNN kernel.
3.Do you try to learning a leader policy？(such as instruction generate/deliver、negotiation……etc)？
4.How to design algorithms for special case like multi-followers？Do you konw some available gamecore/environment？
I am glad to talk with you hah.

1.Is the dataset necessary? If the dataset is difficult to get, does RL work?(especially large vocabulary) I am trying to verify this……；

I'm not sure what you mean by necessary. In all of my experiments, I've used the original human-human interactions that are available in this repo. I haven't tried doing RL with synthetically-generated instructions or anything like that.

How to ensemble vanilla state(such as game image) and natural language？In your paper，RNN processed language and learned parameters of CNN kernel.

I'm not sure I understand the question, and you can see the paper for more details about the architecture (also check the appendix because we have a more detailed description of LingUNet there if you are interested). But in general, we use an RNN to encode an instruction, embed the symbolic representation of the environment into an image and apply convolutions over it, and then combine the two. Then we apply LingUNet on top of it. LingUNet converts the text representation into convolutional kernels which are then applied as the kernels in a traditional U-Net. See this paper: https://arxiv.org/pdf/1809.00786.pdf for more details about the LingUNet architecture.

Do you try to learning a leader policy？(such as instruction generate/deliver、negotiation……etc)？

We don't have any papers doing this, but it is an interesting idea.

How to design algorithms for special case like multi-followers？Do you konw some available gamecore/environment？

I'm actually not familiar with environments which support this kind of interaction. But one could probably modify the CerealBar game itself (i.e., the Unity implementation) to support such an interaction.

Understood! In other words, you designed a new model architecture for your partial observation. Could you share with me your idea about this?

Partial observability should work with the architecture in this repository (the one described in the paper). It just isn't implemented completely here, and since I've moved onto another architecture I don't have the time to implement it fully here. Here are some modifications that would need to be done to add partial observability to the architecture in this repo:

Examples currently come with a target sequence including the state of the world and action taken in that state. Each step would also need to be annotated with some representation of a partial observation including what locations are observed, maybe with some memory of what was observed in the past.
The plan predictor (stage 1) would need to be trained to predict a new plan for every step the agent takes, because the observations are changing.
You may also want to modify the plan predictor to predict probabilities that the goals are unobserved (similar to this paper, Blukis et al. 2018: https://arxiv.org/pdf/1910.09664.pdf)
May also need to modify the architecture of both model components to include some kind of observability mask (so the agent knows what is not observed).
The action generator architecture (stage 2) could probably stay roughly the same, except that there is a new plan at each step (instead of just one plan per action). Recurrence over actions thus may be less important for the action generator. If predicting probabilities that goals are unobserved, it should take this as input.
During inference (i.e., rollouts with the model), you would need to keep track of the agent's current observation and update it wrt. the actions it is taking, so that the model has access to the most up-to-date observations. The entire network would need to be run end-to-end to get a single action, rather than generating the entire sequence of actions at once.

1.Is the dataset necessary? If the dataset is difficult to get, does RL work?(especially large vocabulary) I am trying to verify this……；

I'm not sure what you mean by necessary. In all of my experiments, I've used the original human-human interactions that are available in this repo. I haven't tried doing RL with synthetically-generated instructions or anything like that.

How to ensemble vanilla state(such as game image) and natural language？In your paper，RNN processed language and learned parameters of CNN kernel.

I'm not sure I understand the question, and you can see the paper for more details about the architecture (also check the appendix because we have a more detailed description of LingUNet there if you are interested). But in general, we use an RNN to encode an instruction, embed the symbolic representation of the environment into an image and apply convolutions over it, and then combine the two. Then we apply LingUNet on top of it. LingUNet converts the text representation into convolutional kernels which are then applied as the kernels in a traditional U-Net. See this paper: https://arxiv.org/pdf/1809.00786.pdf for more details about the LingUNet architecture.

Do you try to learning a leader policy？(such as instruction generate/deliver、negotiation……etc)？

We don't have any papers doing this, but it is an interesting idea.

How to design algorithms for special case like multi-followers？Do you konw some available gamecore/environment？

I'm actually not familiar with environments which support this kind of interaction. But one could probably modify the CerealBar game itself (i.e., the Unity implementation) to support such an interaction.

Understood! In other words, you designed a new model architecture for your partial observation. Could you share with me your idea about this?

Partial observability should work with the architecture in this repository (the one described in the paper). It just isn't implemented completely here, and since I've moved onto another architecture I don't have the time to implement it fully here. Here are some modifications that would need to be done to add partial observability to the architecture in this repo:

Examples currently come with a target sequence including the state of the world and action taken in that state. Each step would also need to be annotated with some representation of a partial observation including what locations are observed, maybe with some memory of what was observed in the past.

The plan predictor (stage 1) would need to be trained to predict a new plan for every step the agent takes, because the observations are changing.

You may also want to modify the plan predictor to predict probabilities that the goals are unobserved (similar to this paper, Blukis et al. 2018: https://arxiv.org/pdf/1910.09664.pdf)

May also need to modify the architecture of both model components to include some kind of observability mask (so the agent knows what is not observed).

The action generator architecture (stage 2) could probably stay roughly the same, except that there is a new plan at each step (instead of just one plan per action). Recurrence over actions thus may be less important for the action generator. If predicting probabilities that goals are unobserved, it should take this as input.

During inference (i.e., rollouts with the model), you would need to keep track of the agent's current observation and update it wrt. the actions it is taking, so that the model has access to the most up-to-date observations. The entire network would need to be run end-to-end to get a single action, rather than generating the entire sequence of actions at once.

It helps me a lot, thanks for your reply.

@ChrisRanger , I'm going to try to debug this later today. It's possible that I missed something when cleaning up the code in this repo (this repo is supposed to be the cleaned-up version of the original one, but I think I missed a few ones), so worst case I will just copy over the original code from another repo into this one.

If you can describe what you are trying to do with the repo (replicate experiments exactly, apply to another dataset, etc.) then that will be helpful for me to know. Thanks!

The finetuning still failed when end_to_end=True, Could you share your origin code or missed snippet with me?

Sounds good. I will take a look at this tomorrow and update the code here and will keep you updated in this thread.

Sounds good. I will take a look at this tomorrow and update the code here and will keep you updated in this thread.

I am happy to heard this, Take your time haha.

Hi，could you offer me the document which helps me to interact with standalone? I only found server.py，It seems incomplete. For example, How can I get current game image(leader's or follower's)? I am trying to establish RL architecture and hope your help.

All of our experiments use a structured representation of the environment, rather than operating directly on the image of the environment itself (i.e., what you see when you start the standalone). So we currently don't support grabbing the top-down or first-person image from the standalone.

I might have linked this above but the UnityGame interacts between your python code and the standalone process. In particular, this line here: https://github.com/lil-lab/cerealbar/blob/master/agent/simulation/unity_game.py#L67 grabs the structured representation of the environment (props, terrain, cards, and player positions) from the standalone after sending a seed to the standalone in the previous line.

That being said, we did at one point try to implement grabbing images from the standalone directly. I haven't tested this in over 2 years, so there are no guarantees that it works, but the interface to the standalone on the Unity side does have some keywords for grabbing images (similar to the keywords for sending other commands to the standalone, like in this function for getting the game score from the Unity game.)

The three keywords are image overhead, image human, and image agent (starting at this line). Here is some untested code which outlines how you might want to implement something which uses these keywords in UnityGame:

def get_image(self):
    # self._connection is the ServerSocket
    self._connection.send_data('image overhead') # (or one of the other two image types)
    image_data = self._connection.receive_data()

edit: another thing I have done in the past when I want to record rollouts for games is use pyscreenshot (https://pypi.org/project/pyscreenshot/). However, this can be very slow so probably not a good option for RL.

All of our experiments use a structured representation of the environment, rather than operating directly on the image of the environment itself (i.e., what you see when you start the standalone). So we currently don't support grabbing the top-down or first-person image from the standalone.

I might have linked this above but the UnityGame interacts between your python code and the standalone process. In particular, this line here: https://github.com/lil-lab/cerealbar/blob/master/agent/simulation/unity_game.py#L67 grabs the structured representation of the environment (props, terrain, cards, and player positions) from the standalone after sending a seed to the standalone in the previous line.

That being said, we did at one point try to implement grabbing images from the standalone directly. I haven't tested this in over 2 years, so there are no guarantees that it works, but the interface to the standalone on the Unity side does have some keywords for grabbing images (similar to the keywords for sending other commands to the standalone, like in this function for getting the game score from the Unity game.)

The three keywords are image overhead, image human, and image agent (starting at this line). Here is some untested code which outlines how you might want to implement something which uses these keywords in UnityGame:
def get_image(self):
    # self._connection is the ServerSocket
    self._connection.send_data('image overhead') # (or one of the other two image types)
    image_data = self._connection.receive_data()
edit: another thing I have done in the past when I want to record rollouts for games is use pyscreenshot (https://pypi.org/project/pyscreenshot/). However, this can be very slow so probably not a good option for RL.

This is enough for me. thank you very much!

I just pushed some changes to the end-to-end inference code, and at least on my end it works.

I just pushed some changes to the end-to-end inference code, and at least on my end it works.

Hi，previous problems have disappeared， but I found another problem.
At this line

cerealbar/agent/evaluation/action_generator_metrics.py

Line 93 in 7e5e176

float(np.mean(np.array(score_increases))) if score_increases else None)

When score_increases=[]，return value would be None and caused error at

cerealbar/agent/evaluation/action_generator_metrics.py

Line 183 in 7e5e176

means_dict[metric_name] = np.mean(np.array(values)).item()

values = [None，……] and can’t get its mean value, I changed

return (float(np.sum(np.array(number_instructions_followed))) / possible_num_followed,
            float(np.mean(np.array(score_increases))) if score_increases else None)

to

return (float(np.sum(np.array(number_instructions_followed))) / possible_num_followed,
            float(np.mean(np.array(score_increases))) if score_increases else 0)

then it works.

I just found a minor problem. At

cerealbar/agent/model/model_wrappers/action_generator_model_wrapper.py

Line 172 in 7e5e176

exit()

,
I don't know why here is a exit(). It is unuseful except caused process ended…………

Your first fix should work. I don't think that metric (score increases) is actually used anywhere anyway, so you can probably ignore it.

The exit is a relic of my debugging/fixes. I tested training end-to-end for a few epochs on a randomly initialized model and it seems to work now (just pushed a change to the repo).

Got it. I just found problems running evaluation.sh , at

cerealbar/agent/evaluation/evaluation.py

Line 96 in 446ae1d

results = action_generator_metrics.execution_accuracies(

Here missed a parameter(logger) and caused error, it should be

from agent.evaluation import evaluation_logger

eval_logger = evaluation_logger.EvaluationLogger(filename=evaluation_args.get_evaluation_results_filename(), log=True)
results = action_generator_metrics.execution_accuracies(
                        model,
                        args.get_game_args(),
                        evaluation_args,
                        logger=eval_logger,
                        instruction_examples=list(examples.values()))

problems about code and paper