LeCAR-Lab / CoVO-MPC

Official implementation for the paper "CoVO-MPC: Theoretical Analysis of Sampling-based MPC and Optimal Covariance Design" accepted by L4DC 2024. CoVO-MPC is an optimal sampling-based MPC algorithm.

Home Page:https://lecar-lab.github.io/CoVO-MPC/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Should we continue to refer to this paper without open source code?

bzx20 opened this issue · comments

When I was reading this paper "Learning Vision-based Pursuit-Evasion Robot Policies" carefully and still could not understand the specific implementation details of the code corresponding to the formula mentioned in it, especially how can I make the fully-observable policy and partially-observable policy implemented in our drones' environment, and how was the latent intent mentioned in the paper produced?

So I immediately sent an e-mail to the first author of the paper, and received a reply which claimed that "Unfortunately, we do not have a public repo of the code as of right now. We are planning on that in the future, but I don't know the exact date when it will be released. "

Thus I want to inquiry if I should still focus on this paper or do you have any other solutions?

I understand your concerns and appreciate your questions. Let me provide some clarifications and answers:

Implementation of the Fully-Observable and Partially-Observable Policies:
At the outset, it may not be necessary to convert all state into the robot's body frame. Using the world frame state could be sufficiently adequate.

Designing the Latent Observation:
Essentially, this is task-dependent.

  • For cooperative tasks, one approach could involve controlling a quadrotor with a pre-specified policy (e.g., moving along a polynomial trajectory, doing sudden turns, or dropping an object). Another quadrotor should be controlled by your policy, which quickly adapts to the others' policy changes. The objective would be to keep the slung moving at a constant speed.
  • For adversary tasks, you may explore using one quadrotor to disturb another.
    Fundamentally, the latent observation corresponds to the future trajectory of the manually controlled quadrotor. The goal is to move the object at the same velocity as another quadrotor.

Considerations on the Focus of the Paper:
This paper serves as a reference point. We aim to apply the concept of adaptation explained in the paper to multi-agent settings with dynamic entanglement. Our research question centers on how different agents can share information and predict other agents' behavior to enhance performance.

Don't hesitate to build your own version of the methodology since we already have RMA in our repo. You could simply replace the RMA observation with the action of other agents. If this is done correctly, everything should function effectively.
In case you have any more queries or need additional clarification, please don't hesitate to reach out.

Thank you for your kind reply! I am now working on the RMA method in dualquad2d env and trying to replace the observation.

When I refer to the RMA params related code in quad3d_free.py, I have some doubts on the written dimensions, such as

# RL parameters
self.action_dim = 4
self.adapt_obs_dim = 22 * self.default_params.adapt_horizon
self.param_obs_dim = 17

Could you give a simple explanation of which observation elements are correspond to the numbers 22 and 17?

Besides, when replacing the observation with the action of other agents, is it enough to define a new obs_type and use it? If I add the actions of the other agent in this way, then how about the step function in which I apply the forces and thrusts?

How is the adaptation/parameter observation dimension determined?

The dimension numbers in self.adapt_obs_dim and self.param_obs_dim are determined via the following methods:

  • The parameter dimension can be defined using a function dubbed get_obs_paramsonly(). This will provide insight into the dimensionality of the parameters.
  • The adaptation dimension can be ascertained by referring to the get_obs_adapt_hist() function.
  • It's essential to note that when training RMA, the parameter observation should not be incorporated into your observation. Instead, it should be passed via the info.
  • Particularly in your case, where the goal is to predict another agent's behavior, it's crucial to include the other agent's future state within the get_obs_paramsonly() function.

Does a new observation type need to be created?

No, there's no necessity to create a new observation type. You simply need to define another function that collects the future states and actions of the other agents and integrates it into the info, rather than in the observation.

What's the method to control another agent?

As for controlling another agent, I presume the current action dimension should be 4. With a predefined policy in action for other agents, the action dimension should ideally become 2. If your question pertains to control in this specific scenario, I advise you to implement it within your step_env function. You can broaden your 2D input action by integrating an additional action from a predefined controller.

Last but not least, I strongly suggest visualizing the policy by rendering it. This provides an intuitive understanding of how well the policy behaves.

Thanks a lot for your reply! I have gone through some revision and fixed several problems. I am still wondering about:

  • How to include the future state in get_obs_paramsonly()? Does the "future state" here mean an array of future trajectory of the other agent?
  • In order to get the future state, what kind of predefined policy should I use? The paper "Learning Vision-based Pursuit-Evasion Robot Policies" comes up with 3 policies: random, MARL and game theory.
  1. you can refer to this.
  2. you can running PID in one of the agent. Or MARL.